Incremental Classification of Nonstationary Data Streams
|
|
- Philip Stewart
- 5 years ago
- Views:
Transcription
1 Incremental Classification of Nonstationary Data Streams Lior Cohen, Gil Avrahami, Mark Last Ben-Gurion University of the Negev Department of Information Systems Engineering Beer-Sheva 84105, Israel bgu.ac.il Abraham Kandel Dept. of Computer Science and Engineering, University of South- Florida, Tampa, Florida 33620, USA Oscar Kipersztok Boeing Phantom Works Mathematics & Computing Technology, P.O. Box 3707 MC 7L- 44, Seattle, WA USA Abstract. Traditional data mining algorithms used in the process of knowledge discovery are based on the assumption that the training data is a random sample drawn from a stationary distribution. In contrast, when dealing with non-stationary data sets, most of the processes generating time-stamped data may change drastically over time. Therefore, the results of a data-mining algorithm, which was run in the past, will be hardly relevant to the future data. In this paper, we present two new incremental approaches to real-time classification of continuous non-stationary data. The proposed approaches apply the data-mining algorithm repeatedly to a sliding window of examples in order to update the existing model, replace it by a former model, or construct a new model if a major concept drift is detected. The algorithms are evaluated on multi-year real-world streams of traffic data vs. the CVFDT online learning algorithm. Keywords Real-time data mining, data streams, incremental learning, online learning, concept drift, information networks. 1. Introduction In recent years, there is an ongoing demand for systems, which will be capable to process massive and continuous streams of information. The use of such systems can be in the field of temperature monitoring, precision agriculture, urban traffic control, classification of stock data, etc. The most important characteristic of such system is to deal with noise, uncertainty, and asynchrony of the real-world data [2]. The complex nature of real world data has increased the difficulties and challenges of data mining in terms of data processing, data storage, and model storage requirements [7]. One of the main difficulties in mining non-stationary continuous data streams is to cope with the changing data concept. The fundamental processes generating most real-time data streams may change over years, months and even seconds, at times drastically. This change, also known as concept drift [3], causes the data-mining model generated from past data, to become less accurate in the classification of new data. Algorithms and methods, which extract patterns from continuous data streams, are known as online real time learning [10]. Real-time data mining of high-speed and non-stationary data streams has large potential in fields such as efficient operation of machinery and vehicles, exploring ecosystem dynamics, and intrusion detection in computer networks. This paper continues our previous work on real-time data mining [8], [9], where we have presented the Basic Incremental On Line Information Network (IOLIN) algorithm, leaving the issue of a better trade-off between accuracy and run time open for the future research. Here we suggest two alternative algorithms for online incremental learning of non-stationary data streams: the Multiple-Model IOLIN and the Advanced IOLIN, both aimed at maximizing the average classification rate (in records per second) while preserving the accuracy rate of the regenerative approach. The two proposed algorithms are evaluated in comparison to the On-Line Information Network (OLIN) algorithm presented by Last in [11] and the more recent Incremental On Line Information Network (IOLIN) algorithm [8], [9]. All these incremental methods are based on the batch information-theoretic algorithm, named Information Network (IN), which
2 was introduced by Last & Maimon [12], [13]. The general idea of the new incremental algorithms is to keep updating an existing model as long as no major concept drift is detected in the arriving data. While most of the existing online algorithms (including OLIN [11]), which deal with the problem of concept drift, build a new model from every new window of training examples, the proposed incremental techniques save the expensive CPU time by performing minimal changes in the structure of the current classification model. We also compare the performance of our incremental methods to the CVFDT algorithm [4], which is designed at mining non-stationary data streams as well. 2. Related work In the case of non-stationary data streams, the optimal situation is to have KDD systems that operate continuously, constantly processing the data received so that potentially valuable information is never lost. In order to achieve this goal, several methods for extracting patterns from non-stationary streams of data have been developed, all under the general title of online (incremental) learning methods. The pure incremental learning methods take into account every new instance that arrives. These algorithms may be irrelevant when dealing with high-speed data streams. Widmer & Kubat [5] have described a series of purely incremental learning algorithms that flexibly react to concept drift and can take advantage of situations where context repeats itself. Their series of algorithms is based on a framework called FLORA. FLORA maintains a dynamically adjustable window during the learning process and whenever a concept drift seems to occur (a drop in predictive accuracy) the window shrinks (forgets old instances), and when the concept seems to be stable the window is kept fixed. Otherwise, the window keeps growing until the concept seems to be stable. FLORA is a computationally expensive methodology, since it updates the classification model with every example added to or removed from the training window. Domingos & Hulten [14] have proposed the VFDT (Very Fast Decision Trees learner) system in order to overcome the longer training time issue of the pure incremental algorithms. The VFDT system is based on a decision tree learning method, which builds the trees based on sub-sampling of a stationary data stream. To deal with non-stationary data streams, Hulten et al. [4] have proposed an improvement to the VFDT algorithm which is the CVFDT (Conceptadapting Very Fast Decision Tree learner). CVFDT applies the VFDT algorithm to a sliding window of a fixed size and builds the model in an incremental manner instead of building it from scratch whenever a new set of examples arrives. CVFDT increases the computational overload vs. VFDT by growing alternate sub-trees at its internal nodes. The model is modified when the alternate sub-tree becomes more accurate than the original one. In [1] the authors claim that methods such as [4], [14] which use the one-pass approach on the data in order create mining models, are limited. The problem with the one-pass approach is that it disables the recognition of changes in the mining model during the data stream process. Aggarwal et al. [1], suggested a framework for the classification of non stationary data streams. The methodology proposed in their paper uses the notion of micro-clusters that are created only from the training data. Each micro-cluster is associated with a set of points, all of which belong to the same class. The maintenance procedure of the micro clusters derives from the nearest neighbor and k-means algorithms and if needed, operations of merging multiple clusters into one cluster are made. In the classification process, the test record is classified to a certain micro-cluster according to its similarity to the cluster s centroid. Last in [11] describes an online classification system that uses an information network (IN) as a classification model. The system called OLIN (On Line Information Network) gets a continuous stream of non-stationary data and builds a network based on a sliding window of the latest examples. OLIN detects a concept drift (an unexpected rise in the classification error rate) and dynamically adjusts the size of the training window and accordingly, the frequency of the model reconstruction. The calculations of the window size in OLIN are based on information-theoretic and statistical measures. The experimental results of [11] show that in non-stationary data streams, dynamic windowing generates more accurate models than the static (fixed size) windowing approach used by CVFDT. Cohen et al [8], [9] extended the ideas suggested in [11] in order to enhance the performance of the OLIN system. The new developed approach called IOLIN (Incremental Online Information Network) also deals with a continuous non-stationary data stream, but this time instead of creating a new model for each sliding window of examples, the system updates a current model according to the new data. In the experiments of [8] and [9], the IOLIN version significantly outperformed the OLIN with regarding to the run time at the expense of a minor drop in the accuracy rate. 3. The Info Fuzzy Algorithm Many learning methods use the information theory to induce classification models. One of the methods, developed by Last & Maimon [12] [13] is the Info-Fuzzy Network algorithm (also known as Information Network-IN). IN, is an 2
3 oblivious tree-like classification model, which is designed to minimize the total number of predicting attributes. The underlying principle of the IN-based methodology is to construct a multi-layered network in order to test the Mutual Information (MI) between the input and target attributes. Each hidden layer is related to a specific input attribute and represents the interaction between this input attribute and those associated with previous layers. The IN algorithm is using a pre-pruning strategy: a node is split if this procedure brings about a statistically significant decrease in the conditional entropy of the target attribute (equal to an increase in the mutual information). If none of the remaining input attributes provides a statistically significant increase in the mutual information, the network construction stops. The output of this algorithm is a network, which can be used as a decision tree to predict the values of the target attribute. Fig. 1 illustrates a sample structure of an information network Layer No. 0 (the root node) Layer No. 1 (First input attribute) Fig. 1. Info-fuzzy Network Two Layered Structure [13] 1 1,1 1,2 3,1 3,2 Layer No. 2 (Second input attribute) Connection Weights Target Layer 4. Incremental On-Line Information Network The OLIN online classification algorithm [11] deals with the potential concept drift in a non-stationary data stream simply by generating a new model for every new sliding window. On one hand, this regenerative approach ensures accurate and relevant models over time and therefore an increase in the classification accuracy. On the other hand, OLIN's major shortcoming is the high computational cost of generating new models. In this section, we present three approaches for incremental learning, which are aimed to overcome the high computational cost of generating new models. As shown in the evaluation section below, the incremental methods achieve almost the same and sometimes even higher accuracy rates than the regenerative algorithm and are significantly cheaper, since there is no need in producing a new model for every new window of examples Basic Incremental Approach We start this section with an overview of the Basic IOLIN algorithm introduced by us in [8]. In general, the basic incremental method updates a current model as long as the concept is stable. If a concept drift has been detected, the algorithm creates a totally new network. The operations for updating the network include checking split validity, examining the replacement of the last layer, and adding new layers as necessary. Each of the update operations is aimed at reducing the error rate of the already existing model, when using it for classifying new instances. The first procedure, check splits validity, suggests eliminating nodes, which are not relevant for the model according to the current training window. The split validity checks start from the root and go downwards the network. The general idea is to ensure that the current split of a specific node actually contributes to the mutual information calculated from the current training set. Elimination of non-relevant splits should decrease the error rate. From experiments conducted on the regenerative OLIN [11], we discovered that when there is no major concept drift between two adjacent windows, the main differences between the newly constructed model and the current one are in the last hidden layer of the information network. For this reason, the second procedure check the replacement of the last layer is made in order to determine which attribute is more appropriate to correspond to the last (final) hidden layer. The alternatives considered are: the attribute that is already associated with the last layer or the second best attribute for the last layer of the previous model. The attribute selected for the last layer of the new model will be the one with the highest conditional mutual information based on the current training window. The last procedure add new layers as necessary attempts to split the nodes of the last layer on attributes, which are not yet participating in the updated network. The basic intuition behind the incremental approach is to update the current classification model with the current training window concept as long as no major concept drift has been detected by a statistically significant drop in predictive accuracy and to build a new model in case of a major concept drift. Actually, the regenerative approach can be viewed as a special case of the incremental approach. If a major concept drift is detected at each arrival of new data, the incremental approach is identical to the regenerative one. So the incremental approach becomes useful only when the concept stays stable between at least one pair of adjacent windows. This condition can be expected to exist in most 3
4 data streams, especially those where the size of the training window is relatively small. Following is the pseudo-code outline of the Basic Incremental OLIN (Basic IOLIN) algorithm: Basic IOLIN Input: Training window, current network model Output: Updated or re-generated network model For each new training window If concept drift is detected Create new network using the Info-Fuzzy algorithm Else Update existing network as follows: 1. Eliminate non-relevant nodes. 2. Replace the last layer with a better one. 3. Add new layers if possible. Determining the size of the training window is extensively discussed in [11]. In general, the size of the window is expanded if the concept appears to be stable and reduced if a concept drift has been detected. The detection of a concept drift is made whenever there is a sudden fall in the accuracy rate. The assumption is that if the concept is stable, the examples in the training window and in the subsequent validation interval conform to the same distribution and there is no statistically significant difference between the model s training and the validation error rates. Another assumption is that the true classification of the incoming records becomes available after a short while. This true classification can then be used to start the model update or the construction process for creating a new predictive model. If however, the true classification is not available from the monitored system, we can keep the current network. Another alternative is to use unsupervised learning methods in order to detect changes in the distribution of unlabelled data and update the current network accordingly. Zeira et al. [6] present a methodology for change detection and segmentation based on a set of statistical estimators. This methodology detects statistically significant changes in nonstationary continuous data streams. The time complexity of the suggested basic incremental algorithm depends on the number of detected concept drifts. As long as no concept drift has been detected, the major computational cost is to add a new layer to the existing network. Checking the split validity of the nodes in the network is relatively cheap (O(n) where n is the number of nodes in the network). In case of concept drift, a new network should be constructed from scratch, and the cost of the network construction procedure is linear in the number of records in a training window, linear in the number of distinct attribute values, and quadratic in the number of candidate input attributes [12] Multiple-Model IOLIN The first new incremental approach introduced in this paper also updates a current model as long as the concept is stable. However, if a concept drift has been detected for the first time, the algorithm searches for the best model representing the current data from all the past networks. The idea is to utilize potential periodicity in the data by reusing previously created models rather than generating a new model with every concept drift as suggested by the Basic IOLIN approach. The search is made as follows: when a concept drift is being detected, the algorithm calculates the target attribute s entropy and compares it to the target s entropy values of each previous training window, where a completely new network was constructed. The previous network to be chosen is the one, which was built from the training window having the closest target entropy to the current target s entropy. The chosen network is used for the classification of the examples arriving in the next validation session. In the case of re-occurrence of a concept drift, a new network will be created like under the Basic IOLIN approach. Following is the pseudo-code outline of the Multiple-Model IOLIN algorithm: Multiple Model IOLIN Input: Training window T, current network model, past network models Output: Updated / Former / New network model For each new training window If concept drift is detected Search for the best network from the past by: 1. Comparing the value of the target s entropy (based on the current window s data) to entropy values in all windows where a totally new network was built. 2. Choose Network with Min E current (T)- E former (T). 4
5 3. Add new layers if possible. If a concept drift is detected again with the chosen network Create totally new network using the Info-Fuzzy algorithm Else update existing network (see below) Else Update existing network as follows: 1. Eliminate non-relevant nodes. 2. Replace the last layer with a better one. 3. Add new layers if possible. In addition to the use of former models in the presence of a concept drift, we also checked another variation for the multi model approach, which included the use of former models even in the case when the concept is stable. This means that when the concept is stable, we will use a former model instead of updating the current model. With this approach, we expect to save additional run time because an appropriate model will probably be found in the adjacent former windows. This approach will be referred to as the pure multiple model approach Advanced IOLIN In the second proposed approach, we enhance the update operations of the incremental algorithms by maintaining the information-theoretic quality of the current model disregarding its predictive accuracy on the new data. We recalculate the conditional mutual information of each layer using the top-down approach: we start with the first layer, and replace the input attributes in that layer and all subsequent layers in case of a significant decrease in the layer s conditional mutual information with the target attribute. A reduction in the mutual information of the first layer will trigger a complete re-construction of the model. The current model will be retained only if there is no significant decrease in mutual information at any layer. In that case, we will try to add new layers representing attributes, which are not yet participating in the network. This approach does not relate directly to the issue of the concept drift detection. The initial network is constantly updated and the presence of a concept drift can be concluded if all the network layers have been replaced. For each session, the conditional mutual information (MI) value of each layer is saved. In the model s update process, a comparison is made between the former MI value of the i th layer and the current MI value. If the current value is nearly as high as the former one (up to 5% difference), the i th layer is kept as is. If the current MI value is significantly lower, the i th layer is replaced with a new one. Following is the pseudo-code outline of the advanced IOLIN algorithm: Advanced IOLIN Input: Training window, current network model, conditional mutual information of each network layer i (Former_Conditional_MI(i)) Output: Updated network model For each new training window For each Layer i in existing network Calculate Cond_MI(i) If Cond_MI(i) Former_Conditional_MI(i)*95% Keep i th layer as is and move to the next layer Else Continue the network construction by adding new layers Save value: Former_Conditional_MI(i) = Conditional_MI(i) If reached the last layer Try adding new layers to the network The time complexity of the advanced approach depends on the number of layers that have been replaced. In the case of a layer replacement, the algorithm searches for the best input attribute from the input candidates. This search cost O(n) where n is the number of candidate input attributes. In the replacement of all layers, which is actually the construction of a totally new network, the cost will be O(n*l) where l is the number of the network layers. 5
6 5. Evaluation The two proposed incremental algorithms were applied to a real-world data stream from the traffic control domain. We have also applied the Basic IOLIN [8] [9], the pure multi-model approach (see above), and the CVFDT algorithm [15] to the same data. The experiments objective was to simulate real-time prediction of the traffic volume at all lanes of a signaled urban junction based on several input attributes such as day hour, weekday, previous volume on the same weekday last week, etc. We have checked two aspects: first, we have compared the predictive accuracy of the algorithms and secondly we have compared their processing rate (in records per second). From the simulation results and the classification speed (as presented in the results sub-section), we can safely assume that the incremental algorithms are capable to simultaneously process data from a large number of sources. Furthermore, the predicted traffic volumes at each direction can be used to improve the traffic plan for the monitored intersection and consequently reduce the average waiting time of drivers at the junction Traffic Data Streams The data acquisition and preprocessing of this dataset are extensively discussed in [8], [9] so here we describe it only in brief. The data streams we used include traffic flow information from under-road sensors at a signaled three-way junction where vehicles can cross the intersection in five different directions. The resulting five data streams related to these directions have included the hourly incoming traffic volumes for 24 hours a day, seven days a week during a period of more than 3 years. Each original data stream has been converted into a relational data set, where a record contains twelve candidate attributes representing the exact time (date, hour, day in week, etc.) when the traffic volume was measured and traffic volumes at earlier points of time (the previous hour, the same hour of the previous day, etc.). The target attribute represents the volume of traffic during a given hour. Due to the fact that the target attribute is continuous we have manually discretized it to four equal-frequency intervals of traffic volume. As mentioned, the traffic data was partitioned into five separate data tables for the five directions. Each table corresponding to a given direction contained about 30,000 hourly records Results The runs of the algorithms were carried out on a Pentium IV processor with 256 MB of RAM. In the experiments, the online learning of the traffic data starts after inducing the initial model from the first 500 records, which leaves the system to work with about 30,000 records for each direction in every year. Figures 2, 3 and 4 present the results of the error rates (training and testing) and the processing time after applying the IOLIN-based algorithms and the CVFDT algorithm to the traffic data sets. Figure 2 shows that the IOLIN-based algorithms outperform the training accuracy rate of the CVFDT algorithm. As expected, the Advanced IOLIN algorithm gets the most accurate results due to the network update process, which updates each layer in the network if necessary. CVFDT Basic Incremental Multi Modal Pure Multi Modal Error Rate Advanced Incremental D5 D4 D3 D2 D1 Data Source Fig. 2. Training Error Rate 6
7 Figure 3 shows the results on the testing data. Again, in most cases the IOLIN-based algorithms outperform the accuracy rate of the CVFDT algorithm with Advanced IOLIN being the most accurate algorithm on all five data streams. CVFDT Basic Incremental Multi Modal Pure Multi Modal Advanced Incremental D5 D4 D3 Data Source D2 D Error Rate Fig. 3. Testing Error Rate Figure 4 presents the run time results. The CVFDT algorithm consumes the least run time for processing the incoming examples. We assume that the reason for that is the one-pass on the data, which significantly reduces the run time. In all IOLIN-based algorithms, the training windows are overlapping so each example may be used several times. Figure 4 also shows that the fastest IOLIN-based algorithm is the multiple-models IOLIN. One can also notice that the Advanced IOLIN requires much longer run time than the other incremental approaches. The reason for that is that the update process checks if changes should be made in each layer. On one hand, this process makes the model more accurate with respect to the training window but on the other hand, this process consumes longer processing time because of the checking process in each layer CVFDT 1200 Basic Incremental Multi Modal Pure Multi Modal Advanced Incremental D5 D4 D3 Data Source D2 D Run Time (Sec) Fig. 4. Run Time 6. Conclusions and Future Work This paper has presented two new incremental approaches to real-time classification of continuous non-stationary data streams. The proposed approaches apply the classification algorithm repeatedly to a sliding window of examples, in order to update the existing model (in the case of the Advanced IOLIN approach) or to replace it by a former model (in the case of the Multiple Model approach). The efficiency of the proposed algorithms has been validated by experiments on real-world data streams. The experimental data in use included a set of five data streams recorded by traffic sensors for more than 3 years. We used the CVFDT algorithm of Domingos & Hulton (2001) for the purpose of performance comparison. From the experimental results, we can conclude that in this dataset, the CVFDT algorithm is preferable in terms of the run time 7
8 but its accuracy rates are significantly lower. In order to improve the run time we suggest as future work to consider the one pass approach on the data when the concept appears to be stable. We assume that this will significantly reduce the processing time. In addition, the future work includes evaluating incremental classifiers on additional real-world data streams from other domains. Acknowledgements. We would like to thank the Traffic Control Center of Jerusalem for granting us the permission to use their traffic database. This work was partially supported under a research contract from the Israel Ministry of Defense and by the National Institute for Systems Test and Productivity at University of South Florida under the USA Space and Naval Warfare Systems Command Grant No. N References [1] C. Aggarwal, J. Han, J. Wang, and P. S. Yu, On Demand Classification of Data Streams, Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug [2] D. E. Culler and W. Hong, Wireless sensor networks: Introduction, Communications of the ACM, Vol 47, No. 6, pp , [3] D.P Helmbold and P.M. Long, Tracking Drifting Concepts by Minimizing Disagreements, Machine Learning, No. 14, pp , [4] G. Hulten, L, Spencer, and P. Domingos, Mining Time-Changing Data Streams, Proc. of KDD 2001, pp , ACM Press, [5] G. Widmer and M. Kubat, Learning in the Presence of Concept Drift and Hidden Contexts, Machine Learning, Vol. 23, No. 1, pp , [6] G. Zeira, O. Maimon, M. Last, L. Rokach, Change Detection in Classification Models Induced from Time Series Data, in Data Mining in Time Series Databases, M. Last, A. Kandel, and H. Bunke (Editors), World Scientific, Series in Machine Perception and Artificial Intelligence, Vol. 57, pp , [7] H. Kargupta, Distributed Data Mining for Sensor Networks, Tutorial, ECML / PKDD [8] L. Cohen, M. Last, G. Avrahami, Incremental Info-Fuzzy Algorithm for Real Time Data Mining of Non-Stationary Data Streams, TDM Workshop, Brighton UK, [9] L. Cohen, G. Avrahami, M. Last, A. Kandel, and O. Kipersztok Real-Time Data Mining of Non-Stationary Data Streams from Sensor Networks, to appear in Information Fusion Journal, Special Issue on Information Fusion in Distributed Sensor Networks.. [10] M. Black and R. J. Hickey, Maintaining the Performance of a Learned Classifier under Concept Drift, Intelligent Data Analysis, No. 3, pp , [11] M. Last, Online Classification of Nonstationary Data Streams, Intelligent Data Analysis, Vol. 6, No. 2, pp , [12] M. Last and O. Maimon, A Compact and Accurate Model for Classification, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 2, pp , February [13] O. Maimon and M. Last, Knowledge Discovery and Data Mining - The Info-Fuzzy Network (IFN) Methodology, Kluwer Academic Publishers, December [14] P. Domingos and G. Hulten, Mining High-Speed Data Streams, Proc. of KDD 2000, pp , [15] G. Hulten, and P. Domingos, "VFML -- A toolkit for mining high-speed time-changing data streams"
SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER
SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationNesnelerin İnternetinde Veri Analizi
Nesnelerin İnternetinde Veri Analizi Bölüm 3. Classification in Data Streams w3.gazi.edu.tr/~suatozdemir Supervised vs. Unsupervised Learning (1) Supervised learning (classification) Supervision: The training
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationA CSP Search Algorithm with Reduced Branching Factor
A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il
More informationLecture 7. Data Stream Mining. Building decision trees
1 / 26 Lecture 7. Data Stream Mining. Building decision trees Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 26 1 Data Stream Mining 2 Decision Tree Learning Data Stream Mining 3
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationAn Apriori-like algorithm for Extracting Fuzzy Association Rules between Keyphrases in Text Documents
An Apriori-lie algorithm for Extracting Fuzzy Association Rules between Keyphrases in Text Documents Guy Danon Department of Information Systems Engineering Ben-Gurion University of the Negev Beer-Sheva
More informationDENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE
DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationMining Frequent Itemsets for data streams over Weighted Sliding Windows
Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationIncremental Learning Algorithm for Dynamic Data Streams
338 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.9, September 2008 Incremental Learning Algorithm for Dynamic Data Streams Venu Madhav Kuthadi, Professor,Vardhaman College
More informationExtending the Growing Neural Gas Classifier for Context Recognition
Extending the Classifier for Context Recognition Eurocast 2007, Workshop on Heuristic Problem Solving, Paper #9.24 14. February 2007, 10:30 Rene Mayrhofer Lancaster University, UK Johannes Kepler University
More informationA Composite Graph Model for Web Document and the MCS Technique
A Composite Graph Model for Web Document and the MCS Technique Kaushik K. Phukon Department of Computer Science, Gauhati University, Guwahati-14,Assam, India kaushikphukon@gmail.com Abstract It has been
More informationIncremental Rule Learning based on Example Nearness from Numerical Data Streams
2005 ACM Symposium on Applied Computing Incremental Rule Learning based on Example Nearness from Numerical Data Streams Francisco Ferrer Troyano ferrer@lsi.us.es Jesus S. Aguilar Ruiz aguilar@lsi.us.es
More informationWeb page recommendation using a stochastic process model
Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,
More informationDatasets Size: Effect on Clustering Results
1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationMaintenance of the Prelarge Trees for Record Deletion
12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of
More informationIntroducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values
Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationOnline Pattern Recognition in Multivariate Data Streams using Unsupervised Learning
Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationA New Online Clustering Approach for Data in Arbitrary Shaped Clusters
A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationCLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES
CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES K. R. Suneetha, R. Krishnamoorthi Bharathidasan Institute of Technology, Anna University krs_mangalore@hotmail.com rkrish_26@hotmail.com
More informationUncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique
Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department
More informationRETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu
More informationClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:
More informationEfficient Mining Algorithms for Large-scale Graphs
Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed
More informationFeature Based Data Stream Classification (FBDC) and Novel Class Detection
RESEARCH ARTICLE OPEN ACCESS Feature Based Data Stream Classification (FBDC) and Novel Class Detection Sminu N.R, Jemimah Simon 1 Currently pursuing M.E (Software Engineering) in Vins christian college
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationSemi supervised clustering for Text Clustering
Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering
More informationK-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection
K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer
More informationIncremental K-means Clustering Algorithms: A Review
Incremental K-means Clustering Algorithms: A Review Amit Yadav Department of Computer Science Engineering Prof. Gambhir Singh H.R.Institute of Engineering and Technology, Ghaziabad Abstract: Clustering
More informationOn Multiple Query Optimization in Data Mining
On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationData mining techniques for data streams mining
REVIEW OF COMPUTER ENGINEERING STUDIES ISSN: 2369-0755 (Print), 2369-0763 (Online) Vol. 4, No. 1, March, 2017, pp. 31-35 DOI: 10.18280/rces.040106 Licensed under CC BY-NC 4.0 A publication of IIETA http://www.iieta.org/journals/rces
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, ISSN
International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, www.ijcea.com ISSN 2321-3469 COMBINING GENETIC ALGORITHM WITH OTHER MACHINE LEARNING ALGORITHM FOR CHARACTER
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationExtended R-Tree Indexing Structure for Ensemble Stream Data Classification
Extended R-Tree Indexing Structure for Ensemble Stream Data Classification P. Sravanthi M.Tech Student, Department of CSE KMM Institute of Technology and Sciences Tirupati, India J. S. Ananda Kumar Assistant
More informationData Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden
Data Stream Mining Tore Risch Dept. of information technology Uppsala University Sweden 2016-02-25 Enormous data growth Read landmark article in Economist 2010-02-27: http://www.economist.com/node/15557443/
More informationA semi-incremental recognition method for on-line handwritten Japanese text
2013 12th International Conference on Document Analysis and Recognition A semi-incremental recognition method for on-line handwritten Japanese text Cuong Tuan Nguyen, Bilan Zhu and Masaki Nakagawa Department
More informationAn Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm
Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy
More informationClustering Large Dynamic Datasets Using Exemplar Points
Clustering Large Dynamic Datasets Using Exemplar Points William Sia, Mihai M. Lazarescu Department of Computer Science, Curtin University, GPO Box U1987, Perth 61, W.A. Email: {siaw, lazaresc}@cs.curtin.edu.au
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationClassification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection
Volume 3, Issue 9, September-2016, pp. 514-520 ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org Classification of Concept Drifting
More informationContext-based Navigational Support in Hypermedia
Context-based Navigational Support in Hypermedia Sebastian Stober and Andreas Nürnberger Institut für Wissens- und Sprachverarbeitung, Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REAL TIME DATA SEARCH OPTIMIZATION: AN OVERVIEW MS. DEEPASHRI S. KHAWASE 1, PROF.
More informationDetection of Anomalies using Online Oversampling PCA
Detection of Anomalies using Online Oversampling PCA Miss Supriya A. Bagane, Prof. Sonali Patil Abstract Anomaly detection is the process of identifying unexpected behavior and it is an important research
More informationRole of big data in classification and novel class detection in data streams
DOI 10.1186/s40537-016-0040-9 METHODOLOGY Open Access Role of big data in classification and novel class detection in data streams M. B. Chandak * *Correspondence: hodcs@rknec.edu; chandakmb@gmail.com
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationA hybrid method to categorize HTML documents
Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationStreaming Data Classification with the K-associated Graph
X Congresso Brasileiro de Inteligência Computacional (CBIC 2), 8 a de Novembro de 2, Fortaleza, Ceará Streaming Data Classification with the K-associated Graph João R. Bertini Jr.; Alneu Lopes and Liang
More informationFast Fuzzy Clustering of Infrared Images. 2. brfcm
Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.
More informationCity, University of London Institutional Repository
City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationA Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition
Special Session: Intelligent Knowledge Management A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Jiping Sun 1, Jeremy Sun 1, Kacem Abida 2, and Fakhri Karray
More informationA Naïve Soft Computing based Approach for Gene Expression Data Analysis
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2124 2128 International Conference on Modeling Optimization and Computing (ICMOC-2012) A Naïve Soft Computing based Approach for
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationAgglomerative clustering on vertically partitioned data
Agglomerative clustering on vertically partitioned data R.Senkamalavalli Research Scholar, Department of Computer Science and Engg., SCSVMV University, Enathur, Kanchipuram 631 561 sengu_cool@yahoo.com
More informationComparative Study of Subspace Clustering Algorithms
Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that
More informationDS-Means: Distributed Data Stream Clustering
DS-Means: Distributed Data Stream Clustering Alessio Guerrieri and Alberto Montresor University of Trento, Italy Abstract. This paper proposes DS-means, a novel algorithm for clustering distributed data
More informationClustering from Data Streams
Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting
More informationMore Efficient Classification of Web Content Using Graph Sampling
More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information
More informationA Self-Adaptive Insert Strategy for Content-Based Multidimensional Database Storage
A Self-Adaptive Insert Strategy for Content-Based Multidimensional Database Storage Sebastian Leuoth, Wolfgang Benn Department of Computer Science Chemnitz University of Technology 09107 Chemnitz, Germany
More informationA Prototype-driven Framework for Change Detection in Data Stream Classification
Prototype-driven Framework for hange Detection in Data Stream lassification Hamed Valizadegan omputer Science and Engineering Michigan State University valizade@cse.msu.edu bstract- This paper presents
More informationAnalyzing Dshield Logs Using Fully Automatic Cross-Associations
Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu
More informationMining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.
DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors
More informationOn Reduct Construction Algorithms
1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory
More informationUniversity of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationKeywords hierarchic clustering, distance-determination, adaptation of quality threshold algorithm, depth-search, the best first search.
Volume 4, Issue 3, March 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Distance-based
More informationDOI:: /ijarcsse/V7I1/0111
Volume 7, Issue 1, January 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on
More informationAssociation Rule Mining and Clustering
Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationIJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): 2321-0613 A Study on Handling Missing Values and Noisy Data using WEKA Tool R. Vinodhini 1 A. Rajalakshmi
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationClassification with Diffuse or Incomplete Information
Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication
More informationK-means based data stream clustering algorithm extended with no. of cluster estimation method
K-means based data stream clustering algorithm extended with no. of cluster estimation method Makadia Dipti 1, Prof. Tejal Patel 2 1 Information and Technology Department, G.H.Patel Engineering College,
More informationA Graph Theoretic Approach to Image Database Retrieval
A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500
More informationKernel-based online machine learning and support vector reduction
Kernel-based online machine learning and support vector reduction Sumeet Agarwal 1, V. Vijaya Saradhi 2 andharishkarnick 2 1- IBM India Research Lab, New Delhi, India. 2- Department of Computer Science
More informationNoval Stream Data Mining Framework under the Background of Big Data
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 5 Special Issue on Application of Advanced Computing and Simulation in Information Systems Sofia 2016 Print ISSN: 1311-9702;
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More information