Distributed Data Mining for Pervasive and Privacy-Sensitive Applications

Size: px

Start display at page:

Download "Distributed Data Mining for Pervasive and Privacy-Sensitive Applications"

Naomi Harmon
6 years ago
Views:

1 Distributed Data Mining for Pervasive and Privacy-Sensitive Applications Hillol Kargupta Department of Computer Science and Electrical Engineering University of Maryland Baltimore County 000 Hilltop Circle Baltimore, MD 2250 Abstract This paper considers the distributed data mining (DDM) problem where transmission or sharing of data is not desirable because of limited bandwidth or privacy-sensitive nature of the distributed, possibly multi-party, data. It notes that most DDM algorithms for such applications produce an ensemble of models (e.g. clusters, classifiers, and associations) generated from that data observed at different sites, sometimes using different techniques. These ensembles are usually difficult to interpret and translate into useful knowledge. The paper argues that linear representations of models are very promising for performing meta-analysis of ensembles that may be useful for addressing this problem. It particularly considers the Fourier representation of discrete structures like decision trees as an example. It also points out several possible applications of this technique, such as PCA-based visualization, aggregation, and construction of redundancy-free ensemble of orthogonal decision trees.. Introduction Data mining deals with the problem of extracting interesting associations, classifiers, clusters, and other patterns from data by paying careful attention to the available computing, storage, communication, and human resources. The emergence of network-based environments has introduced many data mining applications where the resources are distributed. The Internet, sensor networks, mobile applications, and widely prevalent networks of desktop computers are some examples of such environments. The field of Distributed Data Mining (DDM) [5, 22] deals with the problem of mining data using distributed resources. When the data can be freely and efficiently transported from one node to another without significant overhead, DDM algorithms may offer better scalability and response time by () properly redistributing the data in different partitions or (2) distributing the computation, or (3) a combination of both. However, when the data sources are distributed and network bandwidth is restricted, DDM algorithms work by avoiding or minimizing communication of data. If the distributed data belong to different parties who do not want to share the raw data or when the privacy and security of the data are of utmost importance, collecting data to a single site may not be possible. DDM techniques may offer a solution to both of these classes of problems. This paper considers DDM applications where communication of large amount of data in its original form is not desirable because of constraints like limited bandwidth and privacy of the data. There exists a large body of literature (for a review see [22]) on this class of DDM algorithms. Most of the techniques to handle this scenario deals with either the homogeneous [4, 7, 2, 23, 28] (sites observing identical set of attributes) or heterogeneous [5, 8, 5, 3] (sites observing different sets of attributes). Many of these DDM techniques [28, 3, 20] produce an ensemble of models generated from different data sites, possibly using different algorithms and systems. Combining these models in an appropriate manner for generating a global perspective of the data is a key problem in distributed data mining. Most ensembles work by combining only the outputs (e.g. class label assigned by the classifier, cluster membership in case of a clustering problem) using various methods like the weighted average, bagging [2], order statisticsbased techniques [3], voting [28], mutual informationmaximization [30]. However, aggregating the outputs of the models does not solve the complete problem, particularly for a distributed data mining application. We need to analyze and understand the characteristics of the aggregated models. This paper focuses on the growing problem of aggregating, understanding, and manipulating the ensemble of models often produced by pervasive and privacy preserving DDM applications. Section 2 discusses the role of DDM for pervasive and privacy preserving applications. Section 3 offers a framework for aggregating and manipulating ensemble of decision trees using Fourier basis. Section 4 describes a 09

2 Pervasive and Privacy Preserving Applications of DDM This section considers a class of distributed data mining applications where downloading data to a single location for subsequent mining is

2 novel scheme for PCA-based visualization of an ensemble of decision trees. Section 5 presents a scalable PCAbased scheme for removing redundancy in a decision treeensemble and constructing the orthogonal decision trees. Finally, Section 6 concludes the paper. 2 Pervasive and Privacy Preserving Applications of DDM This section considers a class of distributed data mining applications where downloading data to a single location for subsequent mining is difficult and undesirable. It particularly focuses on distributed data mining in a () mobile, pervasive environment and (2) privacy-sensitive multi-party environment. 2. Distributed Data Mining in a Pervasive Environment Analyzing and monitoring time-critical data streams generated from distributed sources in a pervasive environment is important for many domains like financial data monitoring, process control, regulation compliance, security, and defense applications. Truly pervasive applications usually involve a heterogeneous collection of computing devices connected by networks with various types of bandwidth constraints. Mobile devices like PDAs, cellphones, laptops, and wearables computers are usually connected over low-bandwidth wireless networks. The emergence of powerful mobile devices with reasonable computing and storage capacity is ushering an era of advanced data and computation-intensive mobile applications. Monitoring and mining time-critical data in a ubiquitous fashion is one such possibility. Sensor webs offer another class of data mining applications. They usually involve different types of sensors and data processing nodes connected over a wireless network. Central collection of data from every sensor node may create heavy traffic over the limited bandwidth wireless channels and this may also drain a lot of power from the devices (data transmission consumes considerable power). A carefully designed distributed architecture for data mining in a sensor network is likely to reduce the communication load and also reduce the battery power more evenly across the different nodes in the sensor network. Data Mining in such pervasive environments calls for an approach that can extract patterns from distributed data without necessarily downloading large volume of data at a regular basis. A recently developed experimental mobile data mining system, MobiMine [6] points out the need for the new generation of DDM algorithms that pays careful attention to the cost of transmitting data and models generated from data streams over low-bandwidth networks. The Figure. (Top) Main screen of the MobiMine. (Bottom) A MobiMine interface for visualizing decision tree ensembles. MobiMine is a mobile application for monitoring, management, and mining of financial data streams from PDAs. Figure shows some of the interfaces of the system. MobiMine deals with a stream of models, continuously generated from the financial data. Similar situation arises in many other DDM applications. For example, consider the distributed on-board vehicle data stream mining system that is currently being developed at the UMBC DIADIC laboratory. The system is designed to mine multiple vehicles in a fleet for online health monitoring. The vehicles are connected to a central control station through a wireless network. The vehicles are also equipped with on-board computer systems that process the continuous stream of locally collected data. The on-board module processes the data, sends generated models to the central control station, and report unusual changes in the observed processes. Figure 2 shows some of the interfaces of the system. Data mining techniques for analyzing, aggregating, and visualizing models generated from the streams play a key 0

3 role in such pervasive DDM applications. Figure 2. Distributed vehicle data stream mining system: (Top) The main interface, (Bottom) The interface for monitoring the health of the vehicle. The interface shows the statespace of the vehicle. However, pervasive applications with limited bandwidth network are not the only applications of the DDM technology. There are many other applications of DDM that deal with distributed environments where bandwidth is not the bottleneck. The following section considers one such class of applications. 2.2 Privacy-Sensitive Distributed Data Mining Applications Preserving the privacy of data is important in many data mining applications. Privacy of the data can depend on many different aspects. In most applications, the privacy issue is somehow related to an individual or groups of individuals sharing some common characteristics in a given context. Sometimes the patterns detected by a data mining system may be used in a counter-productive manner that violates the privacy of an individual or a group of individuals. Therefore, it is important to protect the privacy of the data and its context while mining. If the privacy is associated with the identity of an individual then sometimes removing the identification information from the data may solve the problem. However, there exist many applications where such simple solutions do not work. The data set may still reveal certain information that violates the privacy of different entities associated with the data. Therefore, in a privacy-sensitive application it is important to create a shield between the data and the data mining program in order to deny direct access to the raw data. Many privacy-sensitive applications involve multiple data and computing nodes. In fact, even if we have a single source of data, by definition the separation between data and the data mining program forces us to treat them as separate entities in a distributed environment. Therefore, it is useful to consider the general problem of mining multiple data sets, located at different sites, that belong to different parties. We assume that the data sets are proprietary and privacy-sensitive. Therefore, exchange of raw data among different parties or sending data out of its owner s secured environment is not preferred. Many distributed data mining algorithms are appropriate for this class of applications since they try to minimize communication of raw data. There exists a growing body of literature on this topic. DDM algorithms are often designed to minimize the movement of raw data and that is usually helpful for privacy preserving applications. Some of the DDM algorithms work without transferring any raw data from the sources and some of them do move part of them if necessary. Some of them preserves privacy and some do not. For example, the meta-learning approach [28], the Fourier spectrum-based approach to combine decision trees [6, 20], the Collective hierarchical clustering [0] are some examples that can be used with minor modifications for privacy-preserving mining from distributed data. There is also a body of work that approaches this problem from a cryptographic perspective. One way to hide the data is to distort it some way while making sure that the data mining techniques can still find the type of patterns we are interested in. A value-distortion-based technique to protect data privacy is suggested in []. Value distortion is defined as,, where denotes the original value and is a random value drawn from some distribution, respectively. The key idea is that using the distribution of, the original distribution of can be approximated. Cryptographic tools are suggested in [32] in order to secure data transmission, along with communication between local sites as opposed to one centralized site. A privacy preserving technique to construct decision trees [24] is reported in [7]. The approach depends on a completely reliable in-

termediary party, in order to regulate the privacy preservation. Kantarcioglu and Clifton [] investigated an association rule mining from homogeneous data using a commutative encryption tool.

4 termediary party, in order to regulate the privacy preservation. Kantarcioglu and Clifton [] investigated an association rule mining from homogeneous data using a commutative encryption tool. We are currently building a DDM environment for privacy-sensitive applications at the UMBC DIADIC laboratory. It is currently equipped with different techniques to mine data without directly accessing it. Figure 3 shows one of the main interfaces of the system. representations of decision trees (e.g., CART[3], ID3[24], and C4.5 [25]) for demonstrating the possibility of going beyond the traditional ensembles that just combine the outputs of the models. This section considers the decision tree since it is a popular technique to learn classifiers from data and it is represented by a discrete structure. Learning decision trees from distributed and stream data often produces large ensembles [6, 20, 26, 29]. The rest of this paper considers the Fourier representation of decision trees which allows efficient representation, aggregation, and manipulation of tree-ensembles. 3. Decision Trees as Numeric Functions Figure 3. An interface for computing feature dependencies from privacy-sensitive data in a distributed environment. It shows the module for computing correlation without directly accessing the raw data. In this paper we consider the DDM perspective of privacy-sensitive applications. Most DDM algorithms that do not share any data with other participating sites, share locally generated models, and combine them using different techniques. Therefore, just like the pervasive DDM applications, privacy-sensitive applications also require proper aggregation, transformation, and understanding of the ensemble of models collected from different sites. This paper considers an algebraic framework to do so. The rest of this paper presents a linear representation-based approach and identifies many different research directions. 3 Linear Representations for Aggregation, Manipulation, and Better Understanding of Ensemble of Models This section considers ensemble of classifiers represented using discrete structures and proposes a framework to aggregate, understand, and manipulate them using formal algebraic operations. It particularly considers linear A decision tree defined over a domain of categorical attributes can be treated as a numeric function. First note that a decision tree is a function that maps its domain members to a range of class labels. Sometimes, it is a symbolic function where features take symbolic (non-numeric) values. However, a symbolic function can be easily converted to a numeric function by simply replacing the symbols with numeric values in a consistent manner. Once the tree is converted to a discrete numeric function, we can also apply any appropriate analytical transformation that we want. Fourier transformation is one such interesting possibility. Fourier representation of a function is a linear combination of the Fourier basis functions. The weights, called Fourier coefficients, completely define the representation. Each coefficient is associated with a Fourier basis function that depends on a certain subset of features defining the domain. This section reviews the Fourier representation of decision tree ensembles, introduced elsewhere [4, 6]. 3.2 A Brief Review of the Fourier Basis in the Boolean Domain Fourier bases are orthogonal functions that can be used to represent any discrete function. In other words, it is a functionally complete representation. Consider the set of all -dimensional feature vectors where the -th feature can take different categorical values. The Fourier basis set that spans this space is comprised of basis functions. Each Fourier basis function is defined as, "!$#&% (*),+ )-.) ' where / and are strings of length ; and 0 are 687 -th attribute-value in x and j, respectively; :9;9:9 < >= and represents the feature-cardinality vector,? 2;9:9;9 is called the j-th basis function. The vector / is called a partition, and the order of a partition / is the number of non-zero feature values it contains. A 2

5 : : Fourier basis function depends on some only when the 7. If a partition/ has exactly number corresponding 0 of non-zeros values, then we say the partition is of order since the corresponding Fourier basis function depends only on those number of variables that take non-zero values in the partition/. A function, that maps an -dimensional discrete domain to a real-valued range, can be represented using the Fourier where is the Fourier Coefficient (FC) corresponding to the partition / and is the complex conjugate ; The Fourier coefficient can be viewed as the relative contribution of the partition / to the function value Therefore, the absolute value of can be used as the significance of the corresponding partition/. If the magnitude of some is very small compared to other coefficients, we may consider the / -th partition to be insignificant and neglect its contribution. The order of a Fourier coefficient is nothing but the order of the corresponding partition. We shall often use terms like high order or low order coefficients to refer to a set of Fourier coefficients whose orders are relatively large or small respectively. Energy of a spectrum is defined by the summation. Let us also define the inner product between two spectra and where "! #$! 2:9;9:9 #$!&% '(% )+* is the column matrix of all Fourier coefficients in an arbitrary but fixed order. Superscript, denotes the transpose operation and -/.0- denotes the total number of coefficients in the spectrum. The inner product, $! $! We will also use the definition of the inner product between a pair of real-valued functions defined over some domain 7. This is defined 3 The following section considers the Fourier spectrum of decision trees and discusses some of its useful properties. 3.3 Properties of Decision Trees in the Fourier Domain This section considers the Fourier spectrum of decision trees with finite depths, bounded by some constant. The underlying functions in such decision trees can be represented by a constant depth Boolean AND and OR circuit (or equivalently <>= circuit). Linial et al. [8] noted that the Fourier spectrum of <>= circuit has very interesting properties and proved the following lemma. Lemma (Linial, 993) Let? of an <A= circuit. Then KJML B %DC $EGFIH? L9N FIOIP"QSR be the size and depth where T / denotes the order of the partition j and U is a non-negative integer. The term on the left hand side of the inequality represents the energy of the spectrum captured by the coefficients with order greater than a given constant U. Lemma essentially states the following property about decision trees: The energy captured by all high order Fourier coefficients is small. This is because the energy of the Fourier coefficients of higher order decays exponentially. This observation suggests that the spectrum of a Boolean decision tree (or equivalently bounded depth function) can be approximated by computing only a small number of low order Fourier coefficients. So Fourier basis offers an efficient numeric representation of a decision tree in the form of a linear function that can be easily stored and manipulated. The exponential decay property of Fourier spectrum also holds for non-boolean decision trees. The complete proof is available elsewhere [2]. There are two additional important characteristics of the Fourier spectrum of a decision tree:. The Fourier spectrum of a decision tree can be efficiently computed [6]. 2. The Fourier spectrum can be directly used for constructing the tree [2]. In other words, we can go back and forth between the tree and its spectrum. This is philosophically similar to the switching between the time and frequency domains in the traditional application of Fourier analysis for signal processing. Fourier transformation of decision trees also preserves inner product. The functional behavior of a decision tree is defined by the class labels it assigns. Therefore, if 6 2 2;9;9:9 % % = are the members of the domain 7 then the functional behavior of a decision tree can be captured by the vector 8;: ) + % % )+*, where the superscript, denotes the transpose operation. The following lemma proves that the inner product between two such vectors is identical to the same in between their respective Fourier spectra. 3

6 2nd Principal Component st Principal Component Figure 4. Visualization of an ensemble of decision trees using PCA. Each point represents a single decision tree. Lemma 2 Let 2$! for all 4 7 then ;:!! $! 6 $! 2 2 and 3 "! 98;: $! 43 The fourth step is true since Fourier basis functions are orthonormal. The Fourier spectrum of a decision tree offers a real valued representation that allows a wide range of different data analysis techniques for analyzing and understanding the ensemble of decision trees. The rest of this paper considers several such possibilities. 4 Visualizing Ensemble of Decision Trees Visual inspection of ensemble models for identifying their relative similarities and dissimilarities is one basic needs for better understanding of an ensemble. This section offers a technique for visualizing decision treeensembles using the Principal Component Analysis (PCA) [9]. Consider a set of decision trees ; 2 2:9;9;92 ; let 2 2 2:9;9:9 be their respective spectra. In order to visualize the functional behavior of any given tree we need to somehow represent the vector ; #$). The inner product between 2 ) and ) provides a measure of their similarities. Therefore, the inner product matrix can be something useful for studying the ensemble. Unfortunately, for most real-life applications explicit operations using #$) are not practical since the domain of ; # is usually very large. However, Lemma 2 offers a practical way to solve this problem. Since Fourier spectrum preserves inner product we can operate in the Fourier domain efficiently without explicitly dealing with the #$) -s. The inner product matrix is useful for measuring pairwise similarity between trees. However sometimes we may need to represent the trees in a new embedding. PCA is a popular technique to construct a smaller dimensional representation of high dimensional data. Although PCA may not be directly applicable to the discrete structures trees, the technique works fine with the representation of decision trees in the Fourier domain. Let be the union of all partitions with non-zero coefficients from all the spectra under consideration and - - is the cardinality of the set. Consider an arbitrary but fixed ordering of all the members of. Let us now define the matrix such that!- #$! -, where K "! - denotes the Fourier coefficient corresponding to the 0 -th partition in the ordering from the spectrum of the tree 8. After translating the column-means of the matrix to zero, we get the new matrix. Therefore,!-!- -, where - is the mean of the 0 -th column of matrix. This is a real valued - - matrix. The covariance matrix of is therefore *. This is a symmetric matrix with an eigenvalue decomposition. A straight forward application of PCA on and its subsequent projection along the dominant eigenvectors can be used to create a compact, smaller dimensional representation of the trees. Figure 4 shows a two-dimensional representation of an ensemble of trees using the two most dominant eigenvectors. Visualization of trees is not the only thing we can do using the Fourier representation of trees. It also offers us a way to create linear combinations the trees and develop the notion of redundancy-free orthogonal trees. The following section outlines these possibilities. 4

7 8 8 O 8-5 Removing Redundancies from Ensembles Combining the output of base models is the central issue in ensemble learning. There exist several well known techniques [2, 27, 33, 3] to do that. All of these techniques combine the output of the base classifiers in different ways. They do not structurally combine the classifiers themselves. The Fourier representation offers a unique way to do that. The Fourier spectrum of a linear combination of decision tree classifiers can be computed by first computing the Fourier spectrum of every tree and then aggregating them using the chosen scheme for constructing the ensemble. be the underlying function representing the ensemble of different decision trees where the output is a weighted linear combination of the outputs of the base classifiers. Then we can 9;9:9 9;9:9 - ) "! Where is the weight of the F decision tree and is the set of all partitions with non-zero Fourier coefficients in its spectrum. 2$! 2 where "$! $! and. Therefore, the Fourier spectrum (an linear ensemble classifier) is simply the weighted sum of the spectra of the member trees. The base models of an ensemble often share redundancy resulting from similar observations noted at different sites. Continuous data stream environments may also introduce redundancy in the generated models because of the underlying periodicity in the data. Therefore, removing redundancy from the base models may be useful for creating the ensemble. The following part of this section explores a Fourier spectrum-based approach to do that. Consider the matrix where!- where - is the output of the tree - for input 4 7. is an matrix where is the size of the input domain and is the total number of trees in the ensemble. An ensemble classifier that combines the outputs of the base classifiers can be viewed as a function defined over the set of all rows in. If D!- denotes the 0 -th column matrix of then the ensemble classifier can be viewed as a function of D! 2 D! 2;9:9;9 D!. When the ensemble classifier is a linear combination of the outputs of the base classifiers we have D!! 9;9;9!, where 5 is the column matrix of the overall ensemble-output. Since the base classifiers may have redundancy, we would like to construct a compact low-dimensional representation of the matrix. However, explicit construction and manipulation of the matrix is difficult, since most practical applications deal with a very large domain. We can try to construct an approximation of using only the available training data. One such approximation of and its Principal Component Analysis-based projection is reported elsewhere [9]. Their technique performs PCA of the matrix, projects the data in the representation defined by the eigenvectors of the covariance matrix of, and then performs linear regression for computing the coefficients 2 2;9:9;92 and. While the approach is interesting, it has a serious problem. First of all, the construction of an approximation of even for the training data is computationally prohibiting for most large scale data mining applications. Moreover, this is an approximation since the matrix is computed only over the observed data set of the entire domain. In the following we demonstrate a novel way to perform a PCA of the matrix, defined over the entire domain. The approach uses the Fourier spectra of the trees, Lemma 2, and works without explicitly generating the matrix. The following analysis will assume that the columns of the matrix are mean-zero. This restriction can be easily removed with a simple extension of the analysis. Note that the covariance of the matrix is *. Let us denote this covariance matrix by =. The 2 0 -th entry of the matrix, =! #$! - $! # 2-3 () The third step is true by Lemma 2. Now let us the consider the matrix where!- - $!, i.e. the coefficient corresponding to the -th member of the partition set from the spectrum of the tree -. Equation implies that the covariance matrices of and are identical. Note that is an - - dimensional matrix. For most practical applications Therefore analyzing using techniques like PCA is significantly easier. The following discourse outlines a PCA-based approach. PCA of the matrix produces a set of eigenvectors which in turn defines a set of Principal Components, 2 2:9;9:9. Let G - $! be the 0 -th component of the -th

8 eigenvector of the matrix G - $! 2 0 G - $! - 98;: G - $! +! - - "!! - $! 98;: - $!. The eigenvalue decom- Where position constructs a new representation of the underlying domain where the feature corresponding to column vector i.e., ) 98;:. Note that is a linear combination of a set of Fourier spectra and therefore it is also a Fourier spectrum. Also note that -s are orthogonal. The above analysis offers a way to construct the Fourier spectra of a set of functions that are orthogonal to each other. We can construct decision trees from each of these spectra using the tree construction technique developed elsewhere and these trees will be mutually orthogonal. These trees can be used to create a less redundant and more efficient ensemble of classifiers. The following section concludes this paper. 6 Conclusions This paper considers one of the central research issues in the field of distributed data mining understanding, aggregating, and manipulating models generated by different types of algorithms from different data sites. It argues that the traditional ensemble-based approaches to combine only the outputs of the base models do not serve the purpose very well as far as distributed data mining is concerned. We need techniques that allow advanced meta-level analysis of models, like detecting the underlying redundancies, visualizing the evolution of the patterns, detecting the stability of the ensemble, and others. This paper proposed an approach based on linear representation of discrete structures. It particularly considered Fourier representation of decision trees and showed that such representations can be very useful for visualizing, aggregating, and removing redundancies from an ensemble. Although, the paper considers the Fourier representation, this is clearly not the only available linear representation around. Distributed data mining applications deal with many discrete structures like graphs and clusters that can also benefit from appropriate linear decompositions. Eigenvectors and Wavelets are other interesting choices for representing ensembles that need further investigations. Acknowledgments The authors acknowledge supports from the United States National Science Foundation CAREER award IIS , NASA (NRA) NAS2-3743, and TEDCO, Maryland Technology Development Center. The author would also like to thank Ligong Yang for producing Figure 4. References [] R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceeding of the ACM SIGMOD Conference on Management of Data, pages , Dallas, Texas, May ACM Press. [2] L. Breiman. Bagging predictors. Machine Learning, 24(2):23 40, 996. [3] L. Breiman, J. H. Freidman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, 984. [4] P. Chan and S. Stolfo. Toward parallel and distributed learning by meta-learning. In Working Notes AAAI Work. Knowledge Discovery in Databases, pages AAAI, 993. [5] R. Chen, S. Krishnamoorthy, and H. Kargupta. Distributed web mining using Bayesian networks from multiple data streams. In IEEE International Conference on Data Mining, pages , CA, USA, 200. [6] W. Fan, S. Stolfo, and J. Zhang. The application of adaboost for distributed, scalable and on-line learning. In Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, 999. [7] G. Forman and B. Zhang. Distributed data clustering can be efficient and exact. In SIGKDD Explorations, volume 2 of 2. ACM, [8] D. Hershberger and H. Kargupta. Distributed multivariate regression using wavelet-based collective data mining. Journal of Parallel Distributed Computing, 6: , 200. [9] H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 933. [0] E. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Lecture Notes in Computer Science, volume 759, pages Springer- Verlag, 999. [] M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In SIGMOD Workshop on DMKD, Madison, WI, June [2] H. Kargupta, I. Hamzaoglu, and B. Stafford. Scalable, distributed data mining using an agent based architecture. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Proceedings of Knowledge Discovery And Data Mining, pages 2 24, Menlo Park, CA, 997. AAAI Press. 6

9 [3] H. Kargupta, W. Huang, K. S., and E. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems Journal Special Issue on Distributed and Parallel Knowledge Discovery, 3: , 200. [4] H. Kargupta and B. Park. Mining time-critical data stream using the Fourier spectrum of decision trees. In Proceedings of the IEEE International Conference on Data Mining, pages IEEE Press, 200. [5] H. Kargupta, B. Park, D. Hershberger, and E. Johnson. Collective data mining: A new perspective towards distributed data mining. In Advances in Distributed and Parallel Knowledge Discovery, Eds: Kargupta, Hillol and Chan, Philip. AAAI/MIT Press, [6] H. Kargupta, B. Park, S. Pittie, L. Liu, D. Kushraj, and K. Sarkar. Mobimine: Monitoring the stock market from a PDA. ACM SIGKDD Explorations, 3(2):37 46, January [7] Y. Lindell and B. Pinkas. Privacy preserving data mining. In Advances in Cryptology CRYPTO 2000, pages 36 54, August [8] N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, fourier transform, and learnability. Journal of the ACM, 40: , 993. [9] C. J. Merz and M. J. Pazzani. A principal components approach to combining regression estimates. Machine Learning, 36( 2):9 32, 999. [20] B. Park, A. R., and H. Kargupta. A fourier analysis-based approach to learn classifier from distributed heterogeneous data. In Proceedings of the First SIAM Internation Conference on Data Mining, Chicago, US, 200. [2] B. H. Park and H. Kargupta. Constructing simpler decision trees from the fourier spectrum of ensemble models: Theoretical issues and application in mining data streams. In communication (Shorter version published in SIGMOD DMKD 02 Workshop, [22] B. H. Park and H. Kargupta. Distributed data mining: Algorithms, systems, and applications. In Data Mining Handbook. To be published, [23] S. Parthasarathy and M. Ogihara. Clustering distributed homogeneous datasets. In PDKK, pages , [24] J. R. Quinlan. Induction of decision trees. Machine Learning, ():8 06, 986. [25] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kauffman, 993. [26] J. R. Quinlan. Bagging, boosting and C4.5. In Proceedings of AAAI 96 National Conference on Artificial Intelligence, pages , 996. [27] P. Smyth and D. Wolpert. Linearly combining density estimators via stacking. Machine Learning, 36( 2):59 83, 999. [28] S. Stolfo et al. Jam: Java agents for meta-learning over distributed databases. In Proceedings Third International Conference on Knowledge Discovery and Data Mining, pages 74 8, Menlo Park, CA, 997. AAAI Press. [29] W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classificaiton. In Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 200. [30] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining partitionings. In Proceedings the 8th National Conference on Artificial Intelligence (AAAI), July, Edmonton, Canada, AAAI. [3] K. Tumer and J. Ghosh. Robust order statistics based ensemble for distributed data mining. In Advances in Distributed and Parallel Knowledge Discovery, Eds: Kargupta, Hillol and Chan, Philip. MIT, [32] J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In The Eighth ACM SIGKDD International conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, CA, July [33] D. Wolpert. Stacked generalization. Neural Networks, 5:24 259,

Orthogonal Decision Trees

1 Orthogonal Decision Trees Hillol Kargupta, Byung-Hoon Park, Haimonti Dutta Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore,