Predicting and Monitoring Changes in Scoring Data
|
|
- Aileen Walker
- 5 years ago
- Views:
Transcription
1 Knowledge Management & Discovery Predicting and Monitoring Changes in Scoring Data Edinburgh, 27th of August 2015 Vera Hofer Dep. Statistics & Operations Res. University Graz, Austria Georg Krempl Business Information Systems Group University Magdeburg, Germany
2 Outline 2 / 42
3 Outline 1. Motivating challenges 3 / 42
4 Outline 1. Motivating challenges 2. Our approach: Unsupervised adaptation to class prior changes Anticipative classification by density drift extrapolation Monitoring of goodness-of-fit of current & suggested models Controlled adaptation process 4 / 42
5 Outline 1. Motivating challenges 2. Our approach: Unsupervised adaptation to class prior changes Anticipative classification by density drift extrapolation Monitoring of goodness-of-fit of current & suggested models Controlled adaptation process 3. Preliminary results & lessons learnt 5 / 42
6 Motivation: Challenges Stream Classification Data arrives continuously Classification model is used over long periods of time 6 / 42
7 Motivation: Challenges Stream Classification Data arrives continuously Classification model is used over long periods of time Non-Stationarity If distributions change (drift) over time: Can be sudden/gradual, singular/recurring, systematic/unsystematic Static model s performance deteriorates Adaptation is required Ex-post adaptation to concept drift 7 / 42
8 Motivation: Challenges Stream Classification Data arrives continuously Classification model is used over long periods of time Non-Stationarity If distributions change (drift) over time: Can be sudden/gradual, singular/recurring, systematic/unsystematic Static model s performance deteriorates Adaptation is required Ex-post adaptation to concept drift Label Paucity of Delay Labels arrive with delay (verification latency, [Kuncheva, 2008, Marrs et al., 2010]) Is waiting for sufficient labelled information affordable? 8 / 42
9 Context / Related Work Change Detection Well established [Sebastião and Gama, 2009], e.g. Statistical Process Control [Gama et al., 2004] ADaptive WINdowing [Bifet and Gavaldá, 2007] Nonparametric Monitoring [Ross et al., 2011] Focus on performance or on location/scale of parameters Mostly assume available recent labelled data (some ideas on usage of unlabelled data: [Zliobaitė, 2010]) Aim is detecting change points: focus on sudden shift 9 / 42
10 Context / Related Work Change Detection Well established [Sebastião and Gama, 2009], e.g. Statistical Process Control [Gama et al., 2004] ADaptive WINdowing [Bifet and Gavaldá, 2007] Nonparametric Monitoring [Ross et al., 2011] Focus on performance or on location/scale of parameters Mostly assume available recent labelled data (some ideas on usage of unlabelled data: [Zliobaitė, 2010]) Aim is detecting change points: focus on sudden shift Change Diagnosis Spatio-temporal density estimation Velocity Density Estimation [Aggarwal, 2005] Ex-post analysis of changes in distributions 10 / 42
11 Change Mining & Anticipative Classification Drift Mining Understanding how distributions change, identifying patterns (drift models [Hofer and Krempl, 2013]) Extrapolation of trends: Anticipative decision trees [Böttcher et al., 2009] Estimating new prior from unlabelled data: Global drift models [Hofer and Krempl, 2013] 11 / 42
12 Our Approach in a Nutshell 12 / 42
13 Contributions in a Nutshell Approach Use unlabelled data for detecting class prior changes: Unsupervised prior estimation Anticipate changes due to gradual, continuous drift: Temporal density extrapolation Monitor goodness-of-fit of current & suggested models Kullback-Leibler-Divergence calculation on (un)labelled data Controlled adaptation process Suggest best fitting variant according to available data 13 / 42
14 Contributions in a Nutshell Approach Use unlabelled data for detecting class prior changes: Unsupervised prior estimation Anticipate changes due to gradual, continuous drift: Temporal density extrapolation Monitor goodness-of-fit of current & suggested models Kullback-Leibler-Divergence calculation on (un)labelled data Controlled adaptation process Suggest best fitting variant according to available data Results Some preliminary experimental results (with NB Classifier): Density extrapolation is (sometimes) useful Extrapolation/Prior-Update without monitoring is (sometimes) misleading Monitoring KL-Divergence on unlabelled data helps identifying best variant Each variable might behave differently 14 / 42
15 The Building Blocks in Detail 15 / 42
16 Details: Unsupervised prior estimation Unsupervised prior estimation Use unlabelled data for detecting class prior changes Estimate the new class prior p 1 (y = 1) = δ 1 p 0 (y = 1) with x Ω δ 1 = (f 1(x) f 0 (x y = 0)) (f 0 (x y = 1) f 0 (x y = 0)) w(x) p 0 (y = 1) x Ω (f 0(x y = 1) f 0 (x y = 0)) 2 w(x) where previous (old) distribution f 0 and class prior p 0 current (new) distribution f 1 and class prior p 1 weighted SSQ criterion w(x) = f 1 (x) For more details, see [Hofer and Krempl, 2013] 16 / 42
17 Details: Temporal density extrapolation Temporal density extrapolation Extrapolate trend patterns in drift Kernel-Density-Extrapolation approach: Spatio-temporal kernel density estimation Pseudo-points with extrapolated, time-varying weights For more details, see [Krempl, 2015] Related: Density forecasting [Tay and Wallis, 2002] (unimodal) 17 / 42
18 Details: Temporal density extrapolation Temporal density extrapolation Extrapolate trend patterns in drift Kernel-Density-Extrapolation approach: Spatio-temporal kernel density estimation Pseudo-points with extrapolated, time-varying weights P(X) 0 feature X X density 0 P(X) 1? X 1 Obs. Density P(X) t 2? Unknown Future Density P(X) 2 time For more details, see [Krempl, 2015] Related: Density forecasting [Tay and Wallis, 2002] (unimodal) 18 / 42
19 Details: Temporal density extrapolation Temporal density extrapolation Extrapolate trend patterns in drift Kernel-Density-Extrapolation approach: Spatio-temporal kernel density estimation Pseudo-points with extrapolated, time-varying weights P(X) 0 feature X X density w 1 P 1 P(X) time Obs. Density P(X) t Kernel of Pseudo Point 1 For more details, see [Krempl, 2015] Related: Density forecasting [Tay and Wallis, 2002] (unimodal) 19 / 42
20 Details: Temporal density extrapolation Temporal density extrapolation Extrapolate trend patterns in drift Kernel-Density-Extrapolation approach: Spatio-temporal kernel density estimation Pseudo-points with extrapolated, time-varying weights w 3 w 4 P 4 feature X X density w 1 P 1 w 2 P 2 P 3 P(X) Obs. Density P(X) t 2 Kernels of Pseudo Points time For more details, see [Krempl, 2015] Related: Density forecasting [Tay and Wallis, 2002] (unimodal) 20 / 42
21 Details: Temporal density extrapolation Temporal density extrapolation Extrapolate trend patterns in drift Kernel-Density-Extrapolation approach: Spatio-temporal kernel density estimation Pseudo-points with extrapolated, time-varying weights w1 w 2 w 3 w 4 feature X w 3 w 4 X density 0 w 1 w 2 1 Obs. Density P(X) t 2 Kernels of Pseudo Points time For more details, see [Krempl, 2015] Related: Density forecasting [Tay and Wallis, 2002] (unimodal) 21 / 42
22 Details: Temporal density extrapolation Temporal density extrapolation Extrapolate trend patterns in drift Kernel-Density-Extrapolation approach: Spatio-temporal kernel density estimation Pseudo-points with extrapolated, time-varying weights w 2 w 1 w 1 w 2 w 3 w 3 w 4 feature X w 4 X density X 0 1 Weight Extrapolation 2 time Extrapolated Kernel Weights For more details, see [Krempl, 2015] Related: Density forecasting [Tay and Wallis, 2002] (unimodal) 22 / 42
23 Details: Temporal density extrapolation Temporal density extrapolation Extrapolate trend patterns in drift Kernel-Density-Extrapolation approach: Spatio-temporal kernel density estimation Pseudo-points with extrapolated, time-varying weights w 1 w 1 w 2 w 3 w 3 w 4 feature X w 3 w 4 w 4 X density X 0 w 2 1 w 1 w 2 Observed Density 2 time Extrapolated Density Extrapolated Pseudopoint Weight For more details, see [Krempl, 2015] Related: Density forecasting [Tay and Wallis, 2002] (unimodal) 23 / 42
24 Results Experiment Setup Public available dataset: Bank Marketing UCI machine learning repository [Asuncion and Newman, 2015] Mix of numerical and categorical attributes instances collected over 30 months Chunk-based processing (100 chunks, window size 3 chunks) Naive Bayes Classifier Kernel-Density-Estimation Kernel-Density-Extrapolation Implemented in Octave/MATLAB 24 / 42
25 Results (1) Preliminary Results Some improvements over KDE w.r.t. ˆf 1 (X, Y ): 25 / 42
26 Results (1) Preliminary Results Some improvements over KDE w.r.t. ˆf 1 (X, Y ): KL- Di v on P( X, Y), opt i mi zed KL- Di v on P( X, Y), no ext r apol at i on, pr i or updat e KL- Di v on P( X, Y), ext r apol at i on, pr i or updat e KL- Di v. t i me 26 / 42
27 Results (1) Preliminary Results Some improvements over KDE w.r.t. ˆf 1 (X, Y ): However, sometimes (on other features) it is not beneficial KL- Di v on P( X, Y), opt i mi zed KL- Di v on P( X, Y), no ext r apol at i on, pr i or updat e KL- Di v on P( X, Y), ext r apol at i on, pr i or updat e KL- Di v. t i me KL- Di v on P( X, Y), opt i mi zed KL- Di v on P( X, Y), no ext r apol at i on, pr i or updat e KL- Di v on P( X, Y), ext r apol at i on, pr i or updat e KL- Di v. t i me 27 / 42
28 Results (1) Preliminary Results Some improvements over KDE w.r.t. ˆf 1 (X, Y ): However, sometimes (on other features) it is not beneficial KL- Di v on P( X, Y), opt i mi zed KL- Di v on P( X, Y), no ext r apol at i on, pr i or updat e KL- Di v on P( X, Y), ext r apol at i on, pr i or updat e KL- Di v. t i me KL- Di v on P( X, Y), opt i mi zed KL- Di v on P( X, Y), no ext r apol at i on, pr i or updat e KL- Di v on P( X, Y), ext r apol at i on, pr i or updat e KL- Di v. t i me 28 / 42
29 Change & Adaptation Monitoring Goodness-of-Fit Monitoring Available information: f 0(X, Y ) and its derivatives f 1(X ) estimates for ˆf 1(X, Y ) and its derivatives 29 / 42
30 Change & Adaptation Monitoring Goodness-of-Fit Monitoring Available information: f 0(X, Y ) and its derivatives f 1(X ) estimates for ˆf 1(X, Y ) and its derivatives Estimate ˆf 1 (X ) according to model 30 / 42
31 Change & Adaptation Monitoring Goodness-of-Fit Monitoring Available information: f 0(X, Y ) and its derivatives f 1(X ) estimates for ˆf 1(X, Y ) and its derivatives Estimate ˆf 1 (X ) according to model Compute its goodness-of-fit compared to f 1 (X ) using Kullback-Leibler Divergence 31 / 42
32 Change & Adaptation Monitoring Goodness-of-Fit Monitoring Available information: f 0(X, Y ) and its derivatives f 1(X ) estimates for ˆf 1(X, Y ) and its derivatives Estimate ˆf 1 (X ) according to model Compute its goodness-of-fit compared to f 1 (X ) using Kullback-Leibler Divergence Perform the divergence-minimising update 32 / 42
33 Change & Adaptation Monitoring Goodness-of-Fit Monitoring Available information: f 0(X, Y ) and its derivatives f 1(X ) estimates for ˆf 1(X, Y ) and its derivatives Estimate ˆf 1 (X ) according to model Compute its goodness-of-fit compared to f 1 (X ) using Kullback-Leibler Divergence Perform the divergence-minimising update Is this a suitable indicator? 33 / 42
34 Results (2) Kullback-Leibler Divergence between f 1 (X ) and ˆf 1 (X ) KL- Di v on P( X), opt i mi zed KL- Di v on P( X), no ext r apol at i on, pr i or updat e KL- Di v on P( X), ext r apol at i on, pr i or updat e KL- Di v. t i me KL- Di v on P( X), opt i mi zed KL- Di v on P( X), no ext r apol at i on, pr i or updat e KL- Di v on P( X), ext r apol at i on, pr i or updat e KL- Di v. t i me 34 / 42
35 Results (3) Kullback-Leibler Divergence between f 1 (X, Y ) and ˆf 1 (X, Y ) KL- Di v on P( X, Y), opt i mi zed KL- Di v on P( X, Y), no ext r apol at i on, pr i or updat e KL- Di v on P( X, Y), ext r apol at i on, pr i or updat e KL- Di v. t i me KL- Di v on P( X, Y), opt i mi zed KL- Di v on P( X, Y), no ext r apol at i on, pr i or updat e KL- Di v on P( X, Y), ext r apol at i on, pr i or updat e KL- Di v. t i me 35 / 42
36 Summary Approach Combines: Unsupervised prior estimation Temporal density extrapolation Monitor goodness-of-fit for controlled adaptation 36 / 42
37 Summary Approach Combines: Unsupervised prior estimation Temporal density extrapolation Monitor goodness-of-fit for controlled adaptation Preliminary Results: Density extrapolation is (sometimes) useful, but Extrapolation/Prior-Update (sometimes) misleading Monitoring goodness-of-fit is required KL-Divergence on unlabelled data helps identifying best variant Each variable might behave differently 37 / 42
38 Summary Approach Combines: Unsupervised prior estimation Temporal density extrapolation Monitor goodness-of-fit for controlled adaptation Preliminary Results: Density extrapolation is (sometimes) useful, but Extrapolation/Prior-Update (sometimes) misleading Monitoring goodness-of-fit is required KL-Divergence on unlabelled data helps identifying best variant Each variable might behave differently Outlook / Further Challenges: KL-Div considers whole feature space, Focus rather on regions around decision boundaries [Vinciotti and Hand, 2003]? Modelling seasonal trends (recurring context) Prediction-Induced Drift [Krempl et al., 2015] Self-Fulfilling / Self-Defeating prophecies 38 / 42
39 Bibliography I Aggarwal, C. C. (2005). On change diagnosis in evolving data streams. IEEE Transactions on Knowledge and Data Engineering, 17(5): Anagnostopoulos, C., Adams, N. M., Hand, D. J., and Leslie, D. (2013). Handling the risk of obsolete information is there a one-size-fits-all strategy? In Credit Scoring and Credit Control XIII. Asuncion, A. and Newman, D. J. (2015). UCI machine learning repository. Bifet, A. and Gavaldá, R. (2007). Learning from time-changing data with adaptive windowing. In Proceedings of the Seventh SIAM International Conference on Data Mining, SDM 2007, April 26-28, 2007, Minneapolis, Minnesota, USA. SIAM. Böttcher, M., Spott, M., and Kruse, R. (2009). An algorithm for anticipating future decision trees from concept-drifting data. In Bramer, M., Coenen, F., and Petridis, M., editors, Research And Development In Intelligent Systems 25, number 7 in Proceedings of AI-2008, pages , London. Springer. 39 / 42
40 Bibliography II Gama, J., Medas, P., Castillo, G., and Rodríguez, P. (2004). Learning with drift detection. In Advances in Artificial Intelligence, Proceedings of the 17th Brazilian Symposium on Artificial Intelligence (SBIA 2004), volume 3171 of Lecture Notes in Computer Science, pages Hofer, V. and Krempl, G. (2013). Drift mining in data: A framework for addressing drift in classification. Computational Statistics and Data Analysis, 57(1): Krempl, G. (2015). Temporal density extrapolation. In Douzal-Chouakria, A., Vilar, J. A., Marteau, P.-F., Maharaj, A., Alonso, A. M., Otranto, E., and Nicolae, M.-I., editors, Proc. of the 1st Int. Workshop on Advanced Analytics and Learning on Temporal Data (AALTD) co-located with ECML PKDD 2015, volume CEUR Workshop Proceedings. Krempl, G., Bodnar, D., and Hrubos, A. (2015). When learning indeed changes the world: Diagnosing prediction-induced drift. In De Bie, T. and Fromont, E., editors, Advances in Intelligent Data Analysis XIV - 14th Int. Symposium, IDA 2015, St. Etienne, France, volume to appear of Lecture Notes in Computer Science. Springer. 40 / 42
41 Bibliography III Kuncheva, L. I. (2008). Classifier ensembles for detecting concept change in streaming data: Overview and perspectives. In Okun, O. and Valentini, G., editors, Proceedings of the second workshop on supervised and unsupervised ensemble methods and their applications (SUEMA2008), volume 245 of Studies in Computational Intelligence, pages Springer. Marrs, G., Hickey, R., and Black, M. (2010). The impact of latency on online classification learning with concept drift. In Bi, Y. and Williams, M.-A., editors, Knowledge Science, Engineering and Management, volume 6291 of Lecture Notes in Computer Science, pages Springer. Ross, G. J., Tasoulis, D. K., and Adams, N. M. (2011). Nonparametric monitoring of data streams for changes in location and scale. Technometrics, 53(4): Sebastião, R. and Gama, J. (2009). A study on change detection methods. In Proceedings of the 4th Portuguese Conf. on Artificial Intelligence, Lisbon. Tay, A. S. and Wallis, K. F. (2002). Density forecasting: A survey. Companion to Economic Forecasting, pages / 42
42 Bibliography IV Vinciotti, V. and Hand, D. J. (2003). Local versus global models for classification problems: Fitting models where it matters. The American Statistician, 57(2): Zliobaitė, I. (2010). Change with delayed labeling: When is it detectable? In IEEE International Conference on Data Mining Workshops (ICDMW 2010), pages / 42
SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER
SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept
More informationHellinger Distance Based Drift Detection for Nonstationary Environments
Hellinger Distance Based Drift Detection for Nonstationary Environments Gregory Ditzler and Robi Polikar Dept. of Electrical & Computer Engineering Rowan University Glassboro, NJ, USA gregory.ditzer@gmail.com,
More informationPatterns that Matter
Patterns that Matter Describing Structure in Data Matthijs van Leeuwen Leiden Institute of Advanced Computer Science 17 November 2015 Big Data: A Game Changer in the retail sector Predicting trends Forecasting
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationMemory Models for Incremental Learning Architectures. Viktor Losing, Heiko Wersing and Barbara Hammer
Memory Models for Incremental Learning Architectures Viktor Losing, Heiko Wersing and Barbara Hammer Outline Motivation Case study: Personalized Maneuver Prediction at Intersections Handling of Heterogeneous
More informationCD-MOA: Change Detection Framework for Massive Online Analysis
CD-MOA: Change Detection Framework for Massive Online Analysis Albert Bifet 1, Jesse Read 2, Bernhard Pfahringer 3, Geoff Holmes 3, and Indrė Žliobaitė4 1 Yahoo! Research Barcelona, Spain abifet@yahoo-inc.com
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationPAPER An Efficient Concept Drift Detection Method for Streaming Data under Limited Labeling
IEICE TRANS. INF. & SYST., VOL.E100 D, NO.10 OCTOBER 2017 2537 PAPER An Efficient Concept Drift Detection Method for Streaming Data under Limited Labeling Youngin KIM a), Nonmember and Cheong Hee PARK
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationHyperspectral Image Change Detection Using Hopfield Neural Network
Hyperspectral Image Change Detection Using Hopfield Neural Network Kalaiarasi G.T. 1, Dr. M.K.Chandrasekaran 2 1 Computer science & Engineering, Angel College of Engineering & Technology, Tirupur, Tamilnadu-641665,India
More informationNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
International Refereed Journal of Engineering and Science (IRJES) ISSN (Online) 2319-183X, (Print) 2319-1821 Volume 4, Issue 2 (February 2015), PP.01-07 Novel Class Detection Using RBF SVM Kernel from
More informationHigh-Dimensional Incremental Divisive Clustering under Population Drift
High-Dimensional Incremental Divisive Clustering under Population Drift Nicos Pavlidis Inference for Change-Point and Related Processes joint work with David Hofmeyr and Idris Eckley Clustering Clustering:
More informationChange Detection in Categorical Evolving Data Streams
Change Detection in Categorical Evolving Data Streams Dino Ienco IRSTEA, Montpellier, France LIRMM, Montpellier, France dino.ienco@teledetection.fr Albert Bifet Yahoo! Research Barcelona, Catalonia, Spain
More informationLecture 7. Data Stream Mining. Building decision trees
1 / 26 Lecture 7. Data Stream Mining. Building decision trees Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 26 1 Data Stream Mining 2 Decision Tree Learning Data Stream Mining 3
More informationAn Adaptive Framework for Multistream Classification
An Adaptive Framework for Multistream Classification Swarup Chandra, Ahsanul Haque, Latifur Khan and Charu Aggarwal* University of Texas at Dallas *IBM Research This material is based upon work supported
More informationDetecting Concept Drift in Classification Over Streaming Graphs
Detecting Concept Drift in Classification Over Streaming Graphs Yibo Yao, Lawrence B. Holder School of Electrical Engineering and Computer Science Washington State University, Pullman, WA 99164 {yibo.yao,
More informationMs. Ritu Dr. Bhawna Suri Dr. P. S. Kulkarni (Assistant Prof.) (Associate Prof. ) (Assistant Prof.) BPIT, Delhi BPIT, Delhi COER, Roorkee
Journal Homepage: NOVEL FRAMEWORK FOR DATA STREAMS CLASSIFICATION APPROACH BY DETECTING RECURRING FEATURE CHANGE IN FEATURE EVOLUTION AND FEATURE S CONTRIBUTION IN CONCEPT DRIFT Ms. Ritu Dr. Bhawna Suri
More informationFeature Based Data Stream Classification (FBDC) and Novel Class Detection
RESEARCH ARTICLE OPEN ACCESS Feature Based Data Stream Classification (FBDC) and Novel Class Detection Sminu N.R, Jemimah Simon 1 Currently pursuing M.E (Software Engineering) in Vins christian college
More informationSAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream Ahsanul Haque and Latifur Khan
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, ISSN
International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, www.ijcea.com ISSN 2321-3469 COMBINING GENETIC ALGORITHM WITH OTHER MACHINE LEARNING ALGORITHM FOR CHARACTER
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationDENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE
DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering
More information1 INTRODUCTION 2 RELATED WORK. Usha.B.P ¹, Sushmitha.J², Dr Prashanth C M³
International Journal of Scientific & Engineering Research, Volume 7, Issue 5, May-2016 45 Classification of Big Data Stream usingensemble Classifier Usha.B.P ¹, Sushmitha.J², Dr Prashanth C M³ Abstract-
More informationAn Experimental Analysis of Outliers Detection on Static Exaustive Datasets.
International Journal Latest Trends in Engineering and Technology Vol.(7)Issue(3), pp. 319-325 DOI: http://dx.doi.org/10.21172/1.73.544 e ISSN:2278 621X An Experimental Analysis Outliers Detection on Static
More informationClassification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection
Volume 3, Issue 9, September-2016, pp. 514-520 ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org Classification of Concept Drifting
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationDomain Selection and Adaptation in Smart Homes
and Adaptation in Smart Homes Parisa Rashidi and Diane J. Cook Washington State University prashidi@wsu.edu 2 Washington State University cook@eecs.wsu.edu Abstract. Recently researchers have proposed
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà Laboratory for Relational Algorithmics, Complexity and Learning LARCA UPC-Barcelona Tech, Catalonia
More informationEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive indows ABSTRACT Albert Bifet Yahoo! Research Barcelona Barcelona, Catalonia, Spain abifet@yahoo-inc.com Bernhard Pfahringer Dept. of Computer
More informationAccurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking
JMLR: Workshop and Conference Proceedings 1: xxx-xxx ACML2010 Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking Albert Bifet Eibe Frank Geoffrey Holmes Bernhard Pfahringer
More informationDiscretizing Continuous Attributes Using Information Theory
Discretizing Continuous Attributes Using Information Theory Chang-Hwan Lee Department of Information and Communications, DongGuk University, Seoul, Korea 100-715 chlee@dgu.ac.kr Abstract. Many classification
More informationClassification of evolving data streams with infinitely delayed labels
Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 21-12 Classification of evolving data streams
More informationSocial Behavior Prediction Through Reality Mining
Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REAL TIME DATA SEARCH OPTIMIZATION: AN OVERVIEW MS. DEEPASHRI S. KHAWASE 1, PROF.
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationFlexible-Hybrid Sequential Floating Search in Statistical Feature Selection
Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and
More informationRole of big data in classification and novel class detection in data streams
DOI 10.1186/s40537-016-0040-9 METHODOLOGY Open Access Role of big data in classification and novel class detection in data streams M. B. Chandak * *Correspondence: hodcs@rknec.edu; chandakmb@gmail.com
More informationDetection and Deletion of Outliers from Large Datasets
Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant
More informationClassification of Log Files with Limited Labeled Data
Classification of Log Files with Limited Labeled Data Stefan Hommes, Radu State, Thomas Engel University of Luxembourg 15.10.2013 1 Motivation Firewall log files store all accepted and dropped connections.
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationKnowledge-Defined Networking: Towards Self-Driving Networks
Knowledge-Defined Networking: Towards Self-Driving Networks Albert Cabellos (UPC/BarcelonaTech, Spain) albert.cabellos@gmail.com 2nd IFIP/IEEE International Workshop on Analytics for Network and Service
More informationEfficient Case Based Feature Construction
Efficient Case Based Feature Construction Ingo Mierswa and Michael Wurst Artificial Intelligence Unit,Department of Computer Science, University of Dortmund, Germany {mierswa, wurst}@ls8.cs.uni-dortmund.de
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationA Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 1131-1140 Research India Publications http://www.ripublication.com A Monotonic Sequence and Subsequence Approach
More informationTRANSDUCTIVE TRANSFER LEARNING BASED ON KL-DIVERGENCE. Received January 2013; revised May 2013
International Journal of Innovative Computing, Information and Control ICIC International c 2014 ISSN 1349-4198 Volume 10, Number 1, February 2014 pp. 303 313 TRANSDUCTIVE TRANSFER LEARNING BASED ON KL-DIVERGENCE
More informationOnline Non-Stationary Boosting
Online Non-Stationary Boosting Adam Pocock, Paraskevas Yiapanis, Jeremy Singer, Mikel Luján, and Gavin Brown School of Computer Science, University of Manchester, UK {apocock,pyiapanis,jsinger,mlujan,gbrown}@cs.manchester.ac.uk
More informationIncremental Rule Learning based on Example Nearness from Numerical Data Streams
2005 ACM Symposium on Applied Computing Incremental Rule Learning based on Example Nearness from Numerical Data Streams Francisco Ferrer Troyano ferrer@lsi.us.es Jesus S. Aguilar Ruiz aguilar@lsi.us.es
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationTechnique For Clustering Uncertain Data Based On Probability Distribution Similarity
Technique For Clustering Uncertain Data Based On Probability Distribution Similarity Vandana Dubey 1, Mrs A A Nikose 2 Vandana Dubey PBCOE, Nagpur,Maharashtra, India Mrs A A Nikose Assistant Professor
More informationStreaming Data Classification with the K-associated Graph
X Congresso Brasileiro de Inteligência Computacional (CBIC 2), 8 a de Novembro de 2, Fortaleza, Ceará Streaming Data Classification with the K-associated Graph João R. Bertini Jr.; Alneu Lopes and Liang
More informationA Patent Retrieval Method Using a Hierarchy of Clusters at TUT
A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan
More informationA ProM Operational Support Provider for Predictive Monitoring of Business Processes
A ProM Operational Support Provider for Predictive Monitoring of Business Processes Marco Federici 1,2, Williams Rizzi 1,2, Chiara Di Francescomarino 1, Marlon Dumas 3, Chiara Ghidini 1, Fabrizio Maria
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationFault Identification from Web Log Files by Pattern Discovery
ABSTRACT International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 2 ISSN : 2456-3307 Fault Identification from Web Log Files
More informationIncremental Classification of Nonstationary Data Streams
Incremental Classification of Nonstationary Data Streams Lior Cohen, Gil Avrahami, Mark Last Ben-Gurion University of the Negev Department of Information Systems Engineering Beer-Sheva 84105, Israel Email:{clior,gilav,mlast}@
More informationEstimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy
Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Pallavi Baral and Jose Pedro Fique Department of Economics Indiana University at Bloomington 1st Annual CIRANO Workshop on Networks
More informationFeature Construction and δ-free Sets in 0/1 Samples
Feature Construction and δ-free Sets in 0/1 Samples Nazha Selmaoui 1, Claire Leschi 2, Dominique Gay 1, and Jean-François Boulicaut 2 1 ERIM, University of New Caledonia {selmaoui, gay}@univ-nc.nc 2 INSA
More informationA NEW HYBRID APPROACH FOR NETWORK TRAFFIC CLASSIFICATION USING SVM AND NAÏVE BAYES ALGORITHM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationA Performance Assessment on Various Data mining Tool Using Support Vector Machine
SCITECH Volume 6, Issue 1 RESEARCH ORGANISATION November 28, 2016 Journal of Information Sciences and Computing Technologies www.scitecresearch.com/journals A Performance Assessment on Various Data mining
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationMobility Data Management & Exploration
Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter
More informationVisualization and text mining of patent and non-patent data
of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationONLINE PARAMETRIC AND NON PARAMETRIC DRIFT DETECTION TECHNIQUES FOR STREAMING DATA IN MACHINE LEARNING: A REVIEW P.V.N. Rajeswari 1, Dr. M.
ONLINE PARAMETRIC AND NON PARAMETRIC DRIFT DETECTION TECHNIQUES FOR STREAMING DATA IN MACHINE LEARNING: A REVIEW P.V.N. Rajeswari 1, Dr. M. Shashi 2 1 Assoc. Professor, Dept of Comp. Sc. & Engg., Visvodaya
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationInformation theory methods for feature selection
Information theory methods for feature selection Zuzana Reitermanová Department of Computer Science Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Diplomový a doktorandský
More informationUVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons
More informationClustering Large Dynamic Datasets Using Exemplar Points
Clustering Large Dynamic Datasets Using Exemplar Points William Sia, Mihai M. Lazarescu Department of Computer Science, Curtin University, GPO Box U1987, Perth 61, W.A. Email: {siaw, lazaresc}@cs.curtin.edu.au
More informationMining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.
DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationTowards Real-Time Feature Tracking Technique using Adaptive Micro-Clusters
Towards Real-Time Feature Tracking Technique using Adaptive Micro-Clusters Mahmood Shakir Hammoodi, Frederic Stahl, Mark Tennant, Atta Badii Abstract Data streams are unbounded, sequential data instances
More informationMULTI-TEMPORAL SAR CHANGE DETECTION AND MONITORING
MULTI-TEMPORAL SAR CHANGE DETECTION AND MONITORING S. Hachicha, F. Chaabane Carthage University, Sup Com, COSIM laboratory, Route de Raoued, 3.5 Km, Elghazala Tunisia. ferdaous.chaabene@supcom.rnu.tn KEY
More informationOptimization of Query Processing in XML Document Using Association and Path Based Indexing
Optimization of Query Processing in XML Document Using Association and Path Based Indexing D.Karthiga 1, S.Gunasekaran 2 Student,Dept. of CSE, V.S.B Engineering College, TamilNadu, India 1 Assistant Professor,Dept.
More informationRelational Classification for Personalized Tag Recommendation
Relational Classification for Personalized Tag Recommendation Leandro Balby Marinho, Christine Preisach, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Samelsonplatz 1, University
More informationProcessing Missing Values with Self-Organized Maps
Processing Missing Values with Self-Organized Maps David Sommer, Tobias Grimm, Martin Golz University of Applied Sciences Schmalkalden Department of Computer Science D-98574 Schmalkalden, Germany Phone:
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationAnnouncements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14
Announcements Computer Vision I CSE 152 Lecture 14 Homework 3 is due May 18, 11:59 PM Reading: Chapter 15: Learning to Classify Chapter 16: Classifying Images Chapter 17: Detecting Objects in Images Given
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationFrom Ensemble Methods to Comprehensible Models
From Ensemble Methods to Comprehensible Models Cèsar Ferri, José Hernández-Orallo, M.José Ramírez-Quintana {cferri, jorallo, mramirez}@dsic.upv.es Dep. de Sistemes Informàtics i Computació, Universitat
More informationFeature-weighted k-nearest Neighbor Classifier
Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka
More informationEFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA Saranya Vani.M 1, Dr. S. Uma 2,
More informationBatch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data
Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1, Albert Bifet 2, Bernhard Pfahringer 2, Geoff Holmes 2 1 Department of Signal Theory and Communications Universidad
More informationAdaptive Parameter-free Learning from Evolving Data Streams
Adaptive Parameter-free Learning from Evolving Data Streams Albert Bifet Ricard Gavaldà Universitat Politècnica de Catalunya { abifet, gavalda }@lsi.up.edu Abstract We propose and illustrate a method for
More informationSSV Criterion Based Discretization for Naive Bayes Classifiers
SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationOn the Impact of Class Imbalance in GP Streaming Classification with Label Budgets
On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets Sara Khanchi, Malcolm I. Heywood, and A. Nur Zincir-Heywood Faculty of Computer Science, Dalhousie University, Halifax,
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationReview on Gaussian Estimation Based Decision Trees for Data Streams Mining Miss. Poonam M Jagdale 1, Asst. Prof. Devendra P Gadekar 2
Review on Gaussian Estimation Based Decision Trees for Data Streams Mining Miss. Poonam M Jagdale 1, Asst. Prof. Devendra P Gadekar 2 1,2 Pune University, Pune Abstract In recent year, mining data streams
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Discovering Knowledge
More informationA Novel Approach for Weighted Clustering
A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,
More information