ONLINE ALGORITHMS FOR HANDLING DATA STREAMS

Size: px

Start display at page:

Download "ONLINE ALGORITHMS FOR HANDLING DATA STREAMS"

Ruby Moody
5 years ago
Views:

Dunja Mladenić Approved by the supervisor: (signature) Study

1 ONLINE ALGORITHMS FOR HANDLING DATA STREAMS Seminar I Luka Stopar Supervisor: prof. dr. Dunja Mladenić Approved by the supervisor: (signature) Study programme: Information and Communication Technologies (ICT3). Doctoral degree... Ljubljana, 2014

3 I Abstract In recent years the problem of mining data streams has gained much attention in the data mining community. A data stream is a continuous, time-changing flow of data. Examples of data streams include: sensor networks, twitter feeds, news articles, query logs, ATM transactions, etc. As data streams are potentially infinite, they cannot be just stored into memory and mined from there. Furthermore their distribution can change over time potentially rendering old data useless or even harmful. This phenomena is known as concept drift. The algorithms for mining data streams must process data online, updating their model in real time. When concept drift occurs they must be able to adapt their model to changes. Many such methods have been proposed in the scientific literature ranging from Decision Trees to Artificial Neural Networks. This paper provides an overview of the growing field of mining data streams. We review several methods and frameworks developed specifically for mining data streams. Furthermore we give a critical judgment of the existing research and present some further research opportunities. Keywords: data mining, stream mining, online algorithms, concept drift, regression, classification

4 II

5 III Contents 1 Introduction Problem Definition Data Streams Data Stream Mining Related Work Decision and Regression Trees for Stream Mining Naïve Bayes for Stream Mining Support Vector Machines for Stream Mining Large Margin Classifiers for Stream Mining Local Linear Models for Stream Mining Artificial Neural Networks for Stream Mining K-Means for Stream Mining Online Hierarchical Clustering for Stream Mining Ensembles for Stream Mining Algorithm Output Granularity Critical Judgment Further Work References... 11

6 IV

7 1 1 Introduction In recent years, the progress in technology, especially hardware, has made it possible for organizations to record and store massive data sets. Such data sets include web logs, Twitter feeds, news articles, phone conversations, ATM transactions and sensor network observations. They are characterized by their volume, velocity and variety, and often called Big Data. Big Data is a collection of data sets, so vast that they are difficult to process using traditional tools and techniques. The challenges when dealing with Big Data include storage, querying, sharing, visualization and analysis. As companies begin implementing Big Data solutions many opportunities present themselves to the research community. In many Big Data applications data arrives sequentially and continuously, and the system has no influence on the order and arrival times. Furthermore as time passes, the distribution generating the data may change, making models inaccurate and often obsolete, a phenomena also known as concept drift. Such data sets are called data streams and require traditional methods to be adjusted and/or redesigned to cope with the constraints presented above. Traditional data mining techniques usually assume the data comes from a static source and can be stored in memory before processing. Many of them require multiple passes over the data set in order to build a static model which they then apply to previously unseen instances. When the data set is a potentially infinite data stream, it becomes technically infeasible to just load it into memory and operate on it from there. Traditional data mining techniques have to be redesigned to process the data online and update their model in real-time. They can store only a small sample of the stream, the rest they must summarized and forgotten. Furthermore they have to detect when concept drift occurs and adapt their model so it does not become obsolete. The remainder of this paper is structured as follows. In section 2 we provide a formal definition of the problem of mining data streams. In section 3 we provide a survey of the related work in the research field. In section 4 we give critical judgment on the related work and finally in section 5 we present some research opportunities and directions for further work.

8 2 2 Problem Definition 2.1 Data Streams A data stream can be modelled as a sequence of data instances arriving continuously and sequentially in real time. The system has no control of the order nor the frequency in which they arrive. Because the stream is potentially unbounded the system can either discard or store the instance once it has been processed, but it can only store a small fraction of the entire data set, the rest must be forgotten. More formally a data stream is a sequence of pairs (ss, ) where ss is a sequence of instances and is a sequence of real time intervals. The elements of ss are generated by a data source OO according to a distribution DD which may, or may not, change over time. The major constraints in this model are: The length of the sequence is potentially unbounded, so it is impossible to store it. Only a small summary can be computed and stored and the rest of information must be discarded (volume). The frequency of arrival is potentially very large and non-constant, so the instance must essentially be processed in real-time (velocity). The distribution generating the sequence may change over time, so past data may become irrelevant or even harmful (variety). 2.2 Data Stream Mining Data mining is often defined as a set of methods that allow for pattern detection under uncertain conditions. Machine learning provides the technical basis of data mining. It is used to extract information from the raw data information that is expressed in a comprehensible form and can be used for a variety of purposes [1]. A variety of machine learning methods include traditional algorithms, like k-means and decision trees, and statistical algorithms like Support Vector Machines, Artificial Neural Networks, Local Linear Models, etc. These methods assume the data is sampled from a stationary distribution and stored in memory before processing. Most require multiple passes over the data and build a static model which is then used for pattern detection and prediction. Their goal is to summarize the data as simply, usefully and elegantly as possible. This summarization can be a mean function, a prediction or a count of how many times a certain event occurred. In the data stream model it is not technically feasible to simply save the incoming instances into a database and operate on them from there. Furthermore because the distribution generating the instances can change, the algorithms must be able need to be able to adapt. Under the constraints presented above the main properties of an ideal model become: high accuracy, fast adaptation rate and low time and space complexity. Furthermore they must operate online and offer a prediction any time in real-time. Some basic data stream mining techniques include: sampling, sketching, load shedding, synopsis, aggregation, wavelets and sliding windows. We will discuss some of them here

9 3 shortly: Sampling is a useful technique for slowing down the sampling rate. It makes a probabilistic choice of whether an instance will be processed or not. It is used as a universal method to reduce the running time of computations as it allows the computation to be performed on a much smaller data set and the result scaled to compensate for the difference in size. Boundaries on the error rate can be computed as a function of the sampling rate. Sketching is the process of projecting the domain of the instances onto a significantly smaller domain using random functions. Like sampling error bounds can be computed [2]. Load shedding refers to the process of dropping a fraction of the data streams during periods of system overload. It is desirable to shed the load in a way that minimizes the drop in accuracy. Synopsis data structures hold summary information of the data streams. This embodies the idea of small memory complexity and approximate solutions to massive data set problems. The complexities of constructing these structures cannot be more then OO(nn) but some solutions which give results closer to OO(llllll(nn)) are needed. Aggregation is a technique of summarizing the incoming data stream. Aggregation functions include mean, variance, maximum, minimum, etc. This technique offers vast memory savings, but can fail if the stream is highly fluctuating. Wavelets are a mathematical technique for representing signals as a weighted sum of simpler, fixed building waveforms at different scales and positions. They attempt to capture trends in numerical functions, decomposing a signal into a set of coefficients [3]. Sliding window is a technique where old instances are removed and replaced by new ones. Two types of sliding windows are called count-based and time-based. A count-based sliding window stores the NN most recent elements while a time-based sliding window stores all the elements that arrived in the latest NN units of time. Other stream mining methods include algorithms developed specifically for processing data streams. These are usually traditional data mining algorithms modified to cope with the constraints presented above. They process instances sequentially, using only a limited amount of memory and update their model before a new instance arrives, usually producing approximate results. We will discuss some of these methods in the next section.

10 4 3 Related Work Since the beginning of the new century mining data streams has gained much attention in the data mining community. The requirement to process fast data streams has motivated the need for approximation algorithms that make use of small amounts of space and time. In this section we give an overview of the different methods and algorithms developed for mining data streams. 3.1 Decision and Regression Trees for Stream Mining Domingos et al. [4] proposed a general approach for scaling up machine learning algorithms called Very Fast Machine Learning (VFML). Their approach requires the user to define a loss function L which penalizes the difference between the model and a hypothetical model built on infinitely many instances. The user then has to bound the loss function by the number of instances the algorithm uses in each step. The user can then compute the number of instances, used in each step of the algorithm that minimizes the running time while respecting the bound on the loss function. In their later work [5] [6] they applied this methodology to mining data streams. They proposed a decision tree induction algorithm called Very Fast Decision Tree (VFDT) and later Concept-adapting Very Fast Decision Tree (CVFDT). When inducing a decision tree, VFDT algorithm uses only a small sample of the available data instances when choosing the split attribute. This makes it suitable for processing data streams where the whole data set is not available or is too large to store in memory. To determine the number of examples needed to split a node, the algorithm uses the Heoffding or Chernov bound which guarantees that, with certain confidence, the attribute it has chosen is the correct one. When the system memory becomes low VFDT reduces its memory requirements by temporarily deactivating learning in the unpromising nodes. CVFDT is an extension to VFDT, which in addition to inducing the decision tree incrementally, allows the underlying decision tree to adapt when concept drift occurs. When this happens, some attributes that previously passed the Hoeffding bound will no longer do so. In this case CVFDT starts to grow an alternative sub-tree and replaces the node when the new sub-tree becomes more accurate. Since then online decision trees have gained much attention in the stream mining community. Rutkowski et al. [7] give a mathematical justification for online decision trees and present an algorithm they call Gaussian Decision Tree (GDT) induction, where they propose a statistical test used to determine the best attribute to split the node. Ikomonovska, Gama and Dzeroski [8] proposed an online regression and model tree induction algorithm which is able to incrementally build and maintain a model tree. 3.2 Naïve Bayes for Stream Mining Godec et al. [9] used an online version of the Naive Bayes (NB) classifier for multi-label classification and visual tracking. To adapt the classifier to mining data streams they approximated the probability distributions with equally binned histograms which were updated as new data arrived. To cope with concept drift they proposed exponential moving average filtering for each bin where the exponential decay is an input parameter.

11 5 3.3 Support Vector Machines for Stream Mining Cauwenberghs and Poggio [10] introduced an incremental and decremental algorithm for learning the Support Vector Machine (SVM) classifier. The algorithm splits the instances into three sets: SS, EE and RR, and where SS contains all vectors strictly on the margin, EE contains vectors exceeding the margin and RR contains the (ignored) vectors within the margin. To satisfy Karush-Kuhn-Tucker (KKT) conditions when a new instance arrives, the coefficients αα ii are updated for all the vectors in the three sets, and some vectors change sets. They show that the algorithm computes the same solution as the batch SVM version. The asymptotical running time of the incremental algorithm is OO(nn), where n is the number of instances, this makes it infeasible to run the algorithm on infinitely many instances. However because of the decremental algorithm, the SVM model can be maintained on s sliding window. Furthermore the system can select the useless instances which it wishes to unlearn and frees memory for the useful instances. Diehl and Cauwenberghs [11] later extended this work by proposing an algorithm which is able to adapt the current model to changes in kernel and regularization parameters. Ma, Theiler and Perkins [12] presented a regression counterpart to the incremental SVM classifier by following the approach of Cauwenberghs and Poggio. They call their algorithm Accurate On-line Support Vector Regression (AOSVR). 3.4 Large Margin Classifiers for Stream Mining Harrington et al. [13] propose an algorithm for learning large margin classifiers. The algorithm is an average perceptron-like algorithm which achieves the diversity in its models using a bagging-like technique. When a new instance arrives it is given to perceptron j with probability p where p is a system parameter which controls the diversity in the perceptrons. The perceptrons are all independent and can be parallelized. One of the disadvantages of this algorithm is the assumption that the instances can be linearly separated. This can however be overcome using the kernel trick but at the cost of time and space complexity. 3.5 Local Linear Models for Stream Mining Vijayakumar et al. [14] present an algorithm for incremental non-linear function approximation in high dimensional spaces. The algorithm assumes that the high dimensional space has locally low dimensional distributions and so only a small part of it needs to be filled with linear models. It incrementally learns the number of linear models, their coefficients and their region of validity, parameterized in a Gaussian kernel by a distance metric. To update the local models the algorithm uses partial least squares to compute orthogonal projections of the input space and performs linear regression on the projected directions. Predicting involves computing a weighted sum of the predictions of all the local models. 3.6 Artificial Neural Networks for Stream Mining Artificial Neural Networks (ANN) [3] are a general function approximation method. A three-layer ANNs can approximate any continuous function with arbitrary precision. In the traditional setting ANNs make several iterations over the training data which makes them

12 6 slow and many times makes them overfit the data. However because of the vast number of instances available in the stream mining setting, each instance can be processed only once. When the label becomes available, the error is back-propagated through the network and the weights are adjusted. Thus the network is able to process infinitely many instances and can update in real-time. 3.7 K-Means for Stream Mining Clustering is perhaps the most frequently used algorithm used in exploratory data analysis. Domingos et al. [4] proposed an approximation algorithm to the k-means problem called Very Fast K-Means (VFKM), following the VFML approach. VFKM uses the Hoeffding bound to determine the number of instances needed in each iteration of the k-means algorithm. It runs a sequence of iterations where each iteration is executed on an increasing number of instances. The algorithm terminates when a statistical bound is reached. McCutchen and Khuller [15] proposed a constant-factor approximation algorithm for the k-center algorithm. The k-center algorithm differs from k-means in minimizing the maximum distance from any instance to its centroid. To make the method robust to outliers, the algorithm accumulates all the incoming data points until enough of them are close together before forming a new cluster. Once a cluster is created any new points that fall into the cluster are forgotten. The algorithm thus holds a maximum of OO(kkkk) points in memory where z is the maximum number of outliers and k is the number of clusters. Once the outliers are accumulated it runs a 4-approximation offline algorithm for k-center clustering with outliers to attempt to cover the points with remaining clusters. If the number of clusters ever becomes more then k the algorithm has a mechanism to drop clusters and their support points. 3.8 Online Hierarchical Clustering for Stream Mining Rodrigues et al. [16] proposed an online hierarchical clustering algorithm for clustering data streams. They called the algorithm Online Divisive-Agglomerative Clustering (ODAC). It builds, and maintains, a tree like hierarchical structure of clusters using a topdown approach. To split the nodes the algorithm continuously monitors the diameter of each leaf. It does this by computing Pearson s correlation coefficient between all the variables in the node and takes the minimum correlation. When a certain condition on the diameter holds it splits the node assigning each variable to one of the children. To determine the minimum number of instances needed to split a node, the system uses Hoeffdings bound which guarantees, that with a certain confidence level, the split was correct. To cope with concept drift the algorithm continuously monitors and compares the diameter of each parent node to the diameter of its children. When a child s diameter, with a certain confidence level, becomes larger than its parents, the system resets the parent node. 3.9 Ensembles for Stream Mining When dealing with data streams, ensembles of models offer several advantages over single model methods. Ensembles are combinations of several models whose individual predictions are combined in some manner to form a final prediction. They are easy to scale, parallelize and can handle concept drift by pruning bad parts of the ensemble.

13 7 Park et al. [17] propose a framework for distributed multimedia stream mining systems. Their framework is a set of classifiers organized in a tree topology. The inner classifiers in the tree are used as filters, filtering data based on a semantic hierarchy of concepts, while leafs classify the actual class of interest. Each classifier can have access to the information of the classifiers in its sub-tree and can form a so-called coalition. It can then select a strategy that maximizes the utility of the entire coalition. They used the framework to identify semantic concepts from a stream of sports images, where they setup a hierarchy of SVM classifiers trained on low level features, such as color histograms, and compared its performance to a centralized approach where the problem is modeled as an optimization problem. Kotler and Maloof [18] presented a method for tackling concept drift based on the Weighted Majority algorithm. They called their method Dynamic Weighted Majority. It maintains a set of online learning algorithms called experts, each with its associated weight. When predicting the algorithm combines the predictions of the experts and forms the final prediction as the class label with the highest accumulated weight. If an expert predicts incorrectly its weight is multiplied by a constant factor ββ. If the global prediction is incorrect the algorithm may create a new expert with weight 1. The algorithm also removes experts with weights lower than a predefined threshold. In [19] Bifet et al. present a method of bagging using Adaptive-Size Hoeffding trees, which have a maximum number of split nodes and can remove nodes to reduce their size. The method limits the size of the nn-th tree as half the size of the (nn 1)-th tree and gives each tree a weight, proportional to the inverse of its squared error, computed as an exponential moving average. The main idea behind this method is that smaller trees will be able to adapt to concept drift faster than larger ones while the latter will perform better when the distribution of the stream remains stationary. Chernov and Vovk [20] introduce a framework of prediction with experts advice, where they formulate the prediction problem as a game-theoretic problem of a game played by a learner and N experts. The goal of the learner is to perform better or at least not worse than each expert. In order to compare the learners loss function with the experts, his loss function is computed separately against each expert (e.g. each expert evaluates his performance and the performance of the learner according to its own criteria). They apply their framework with the defensive forecasting algorithm and to the framework of specialist experts, where an expert may abstain from making a prediction Algorithm Output Granularity Algorithm Output Granularity (AOG) is a new approach proposed by Gaber et al. [21]. It is the first resource-aware approach where the system is able to adapt to time and memory constraints as well as the data stream rate. The main steps of the algorithm are the following: Compute the algorithm threshold according to the incoming data rate and available memory. Mine the incoming data stream using the algorithm threshold. After a period of time re-compute the algorithm threshold using linear regression. This adaptation is achieved using the algorithm threshold which is first estimated and then periodically recomputed to cope with variations with the sampling rate. They propose three

14 8 lightweight algorithms that follow this approach: one for clustering, one for classification and one for counting frequent items. We will discuss only the first two. The proposed clustering technique is a one pass clustering technique called LWC and works as follows. The first instance is considered as the first centroid. As new instances arrive, the distance to the nearest centroid is computed. If this distance is less than a threshold t (which is periodically adjusted) a new center is created. Otherwise the weight of the nearest centroid is increased by one. When the number of centers becomes k (k depends on the available memory) the algorithm starts updating the cluster vector. When memory becomes full the algorithm integrates clusters. The classification technique is an adaptation of the k-nearest neighbor method (KNN). Instead of storing all the instances in memory the algorithm measures the distance of the new instance to the nearest one already stored in memory. If this distance is less than a threshold (which is periodically adjusted), and the classes are the same, the algorithm stores the average of the two with an increased weight. If the classes differ the algorithm deletes both instances from memory. When classifying an instance the number of nearest neighbors is also chosen according to the time constraints.

15 9 4 Critical Judgment Our review shows that many algorithms have been developed for mining data streams. These are usually traditional data mining algorithms modified for the constraints of the data stream model. They generally follow one of the two approaches: one-pass approximation or sliding window. The methods that follow the one-pass approach normally approximate traditional data mining algorithms and typically suffer from lower accuracy. Furthermore many of these methods do not even address concept drift and those that do usually suffer from heavier time and space complexity. While the methods that follow the sliding window approach usually produce an exact model but only on the most recent data. In the real world, data streams come from different sources, e.g. sensor observations, and are typically noisy, contain redundant features and potentially missing values so they must be preprocessed. When dealing with concept drift adapting only the predictor may not be enough to maintain good accuracy as the system may fail to detect changes in the raw data. The issue of adaptive preprocessing was raised in [22], where they identified scenarios where adaptive preprocessing is beneficial, but much research is still needed. Most of the methods presented in the previous section have only been evaluated on specific domains with only one type of data, such as text, images or sensor observations. A comparison of the methods on different kinds of data (variety) is much needed. Furthermore a general framework which could cope with variety is needed. Most of the methods found in the literature do not take into consideration the sampling rate of the incoming data stream. The user has to check if the method can adapt faster than the instances arrive and has to manually retune the parameters to make the method responsive. One of the approaches offers a solution to this problem, but only basic algorithms that follow that methodology have been proposed. With the vast amounts of data flowing into the system overfitting becomes a big risk. While overfitting is mentioned in the literature, it is not addressed as an open issue that needs special consideration.

16 10 5 Further Work Some ideas for future work include: Context-aware mining: When predicting future events contextual information may improve prediction accuracy substantially. For example when predicting traffic congestions it is not enough to only consider data coming from the sensor network, information like time of day, day of the week, weather forecast and information gained from social networks should also be taken into account. Most techniques presented in previous sections do not consider context when building models and making predictions. As part of further work context aware techniques or frameworks may be investigated and developed. Cost of updating vs. accuracy gain: Most of the algorithms proposed in the literature update their model continuously as instances arrive, stressing the underlying hardware and making it unavailable for other tasks. Future work may include investigating the tradeoff between the cost of updating the model versus the accuracy gained by updating. Furthermore a methodology, which takes cost into consideration and only updates the model when high gains are expected, may be developed. Addressing variety: In real-world applications data streams come in several forms. Examples include: sensor network observations, news articles, images, etc. The proposed methods may be tested on a variety of different types of data sources, and a thorough comparison given. Furthermore methods and frameworks that deal with various types of data may be developed. Development of new methods: Many methods and techniques have been developed specifically for mining data streams. There is, however still room for improvement. A thorough investigation of the current techniques can be performed and new, one pass, algorithms designed specifically for mining data streams developed. Overfitting: When dealing with large amounts of data overfitting becomes a serious risk. A study of this phenomena in data stream mining may be performed and this issue may be addressed thoroughly. Change detection: When mining data streams the distribution of the data can change, potentially rendering models obsolete. Future work may include development of change detection techniques and methods for change adaptation.

17 11 6 References [1] I. H. Witten, F. Eibe, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, [2] F. Rusu and A. Dobra, Sketching sampled data streams, 2009 IEEE 25th Int. Conf. Data Eng., pp , Mar [3] J. Gama and M. Gaber, Learning from Data Streams. Springer, [4] P. Domingos and G. Hulten, A general method for scaling up machine learning algorithms and its application to clustering, ICML, [5] P. Domingos and G. Hulten, Mining high-speed data streams, Proc. sixth ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD 00, pp , [6] G. Hulten, L. Spencer, and P. Domingos, Mining time-changing data streams, Discov. data Min., vol. 18, pp. 1 10, [7] L. Rutkowski, M. Jaworski, L. Pietruczuk, and P. Duda, Decision Trees for Mining Data Streams Based on the Gaussian Approximation, IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp , Jan [8] E. Ikonomovska, J. Gama, and S. Džeroski, Learning model trees from evolving data streams, Data Min. Knowl. Discov., vol. 23, no. 1, pp , Oct [9] M. Godec, C. Leistner, A. Saffari, and H. Bischof, On-Line Random Naive Bayes for Tracking, th Int. Conf. Pattern Recognit., no. 2, pp , Aug [10] G. Cauwenberghs and T. Poggio, Incremental and decremental support vector machine learning, Adv. neural Inf., [11] C. P. Diehl and G. Cauwenberghs, Svm incremental learning, adaptation and optimization, Proc. Int. Jt. Conf. Neural Networks, 2003., vol. 4, no. x, pp , [12] J. Ma, J. Theiler, and S. Perkins, Accurate on-line support vector regression., Neural Comput., vol. 15, no. 11, pp , Dec [13] E. Harrington, R. Herbrich, and J. Kivinen, Online bayes point machines, Proc. Seventh Pacific-Asia Conf. Knowl. Discov. Data Min., [14] S. Vijayakumar, A. D Souza, and S. Schaal, Incremental online learning in high dimensions., Neural Comput., vol. 17, no. 12, pp , Dec [15] R. M. Mccutchen and S. Khuller, Streaming Algorithms for k -Center Clustering with Outliers and with Anonymity, in Approximation Randomization and Conbinatorial Optimization, Springer, 2008, pp

18 12 [16] P. Rodrigues, Hierarchical clustering of time-series data streams, Knowl. Data, vol. X, no. X, pp. 1 12, [17] H. Park and D. Turaga, A framework for distributed multimedia stream mining systems using coalition-based foresighted strategies,, Speech Signal, pp , [18] J. Z. Kolter and M. a. Maloof, Dynamic weighted majority: a new ensemble method for tracking concept drift, Third IEEE Int. Conf. Data Min., pp , [19] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà, New ensemble methods for evolving data streams, Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD 09, p. 139, [20] A. Chernov and V. Vovk, Prediction with expert evaluators advice, in Algorithmic Learning Theory, 2009, pp [21] M. Gaber, Adaptive mining techniques for data streams using algorithm output granularity, Australas. Data Min., [22] B. Gabrys, Adaptive Preprocessing for Streaming Data, IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp , 2014.

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept