A Privacy Preserving Data Mining Methodology for Dynamically Predicting Emerging Human Threats
|
|
- Christina Reeves
- 6 years ago
- Views:
Transcription
1 DETC A Privacy Preserving Data Mining Methodology for Dynamically Predicting Emerging Human Threats Tuesday, August 6 th, 2013 Gautam Manohar & Conrad S. Tucker {gautam.atulya@gmail.com, ctucker4@psu.edu, } Introduction Manohar, Tucker
2 Presentation Overview Research Motivation and Background Methodology The Knowledge Discovery process Data Acquisition and Storage Data Mining Predictive Model Construction Result Interpretation and Output Application Case Study Results and Discussion Conclusion and Path Forward Presentation Overview Manohar, Tucker
3 RESEARCH MOTIVATION Research Motivation Manohar, Tucker
4 Motivation Research Motivation Manohar, Tucker
5 Tracking sample Tracking video Capturing Emergence Manohar, Tucker
6 Motivation and Background Existing systems are passive and more useful for post-incident analysis. Privacy issues with most existing systems become a hindrance in public use (I.e. the need to preserve Personally Identifiable Information (PII)) Research Motivation Manohar, Tucker
7 Why Individual Body Movement Data? BODY LANGUAGE "The most important thing in communication is to hear what isn't beingsaid." Peter F. Drucker Literature Review Manohar, Tucker
8 RESEARCH METHODOLOGY Research Methodology Manohar, Tucker
9 Proposed Methodology Research Methodology Manohar, Tucker
10 Step 1: Data Acquisition Data acquisition hardware setup consists of a sensor system with: an RGB video camera, and an infrared depth sensor Output from sensors is used to create a virtual skeleton of the subject with 20 nodes as shown Each nodes collects data pertaining to: 3D Spatial Coordinates (X,Y,Z) Timestamp Velocities of each node Research Methodology High Fidelity Data, Privacy Preserving Manohar, Tucker
11 Large Scale Data Base Research Methodology Manohar, Tucker
12 Proposed Methodology Research Methodology Manohar, Tucker
13 Step 2: Data Transfer and Storage The data is stored in a structured Relational Database with fields for the following measures: Timestamp Euclidean Coordinates Velocities of each node Boolean Threat Class defining whether the data collected during training was for a threat action or not. Research Methodology Manohar, Tucker
14 Step 2: Data Transfer and Storage The data is stored in a structured Relational Database with fields for the following measures: Research Methodology Manohar, Tucker
15 Proposed Methodology Research Methodology Manohar, Tucker
16 Step3: Data Mining/Knowledge Discovery Research Methodology Manohar, Tucker
17 Knowledge Discovery in Data Bases Supervised Learning Unsupervised Learning Research Methodology Manohar, Tucker
18 Supervised VS Unsupervised Learning Supervised y=f(x): true function D: labeled training set D: {x i,f(x i )} Learn: G(x): model trained to predict labels D Goal: E[(F(x)-G(x)) 2 ] 0 Well defined criteria: Accuracy, RMSE,... Unsupervised Generator: true model D: unlabeled data sample D: {x i } Learn Underlying data structure Goal: Find natural patterns Well defined criteria: varies Research Methodology Manohar, Tucker
19 Capturing Threat Emergence Time t 1 Time t n Time t n+1 Model(t 1 ) Model(t n ) Model(t n+1 ) Research Methodology Manohar, Tucker
20 Data Mining Decision Tree Induction Given a time stamped Data Set (t), Feature 1 Feature 2 Feature N Class A 1,1 A 2,1 A N,1 C j, A 1,M A 2,M A N,M C j,m Entropy( T ) p( C T )log p( C T ) = j T GAIN( X ) Entropy( T ) Entropy ( T ) Gain ratio(x) = j k i = X i i= 1 T k i= 1 2 Gain( X ) Ti Ti log2 T T j Tucker C., H.M. Kim,"Trend Mining for Predictive Product Design", Transactions of ASME: Journal of Mechanical Design, Vol. 133, No. 11, Research Methodology Manohar, Tucker
21 Features Time Series Gain Ratio Predict t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13_predict X_Elbow Joint Y_Hip_Joint X_Shoulder X_Accel_Arm Y_Accel_Hip Z_Arm_Joint Feature Gain Ratio Plot Over Time Gain Ratio Research Methodology Time Hard Drive TalkTime Camera Interface Connectivity Manohar, Tucker X_Elbow Joint Y_Hip_Joint X_Shoulder X_Accel_Arm Y_Accel_Hip Z_Arm_Joint 2 G Processor 21
22 Features Time Series Gain Ratio Predict t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13_predict X_Elbow Joint Y_Hip_Joint X_Shoulder X_Accel_Arm Y_Accel_Hip Z_Arm_Joint Feature Gain Ratio Plot Over Time Gain Ratio Research Methodology Time X_Elbow Joint Hard Drive Y_Hip_Joint TalkTime X_Shoulder Camera X_Accel_Arm Interface Y_Accel_Hip Connectivity Z_Arm_Joint Manohar, Tucker G Processor 22
23 n- time stamped data sets No IM(Feature (i), Data Set (t)) i=i+1 Data set (t)=n No Predict IM(Feature (i)) Yes i=i+1 Split Data Sets 1,,n based on Max Predicted IM (Feature(1), Feature (k)) For Each Subset, P (Class 1) Yes End TREE, Classify Irrelevant Features Manohar, Tucker
24 n- time stamped data sets No IM(Feature (i), Data Set (t)) i=i+1 No Data set (t)=n Predict IM(Feature (i)) Yes i=i+1 Split Data Sets 1,,n based on Max Predicted IM (Feature(1), Feature (k)) For Each Subset, P (Class 1) Yes End TREE, Classify Irrelevant Features Manohar, Tucker
25 Holt-Winters Forecasting The (k) step-ahead forecasting model is defined as: y ( k ) = L + kt + I t t t s + k t Where: Level Lt (the level component): L = α( y I ) + (1 α)( L + T ) t t t s t 1 t 1 Trend Tt (the slope component): T = γ( L L ) + (1 γ) T t t t 1 t 1 Season It (the seasonal component): I = δ( y L) + (1 δ) I t t t t s The smoothing parameters α,γ δ, are in the range {0,1} Research Methodology Manohar, Tucker
26 Features Time Series Gain Ratio Predict t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13_predict X_Elbow Joint Y_Hip_Joint X_Shoulder X_Accel_Arm Y_Accel_Hip Z_Arm_Joint Feature Gain Ratio Plot Over Time Gain Ratio Research Methodology Time Hard Drive X_Elbow Joint TalkTime Y_Hip_Joint Camera X_Shoulder Interface X_Accel_Arm Manohar, Tucker Connectivity Y_Accel_Hip 2 Z_Arm_Joint G Processor
27 n- time stamped data sets No IM(Feature (i), Data Set (t)) i=i+1 Data set (t)=n No Predict IM(Feature (i)) Yes i=i+1 Split Data Sets 1,,n based on Max Predicted IM (Feature(1), Feature (k)) For Each Subset, P (Class 1) Yes End TREE, Classify Irrelevant Features Manohar, Tucker
28 Split Data Sets (t 1,,t n ) : Max IM Time t 1 Time t n A i,1 A i,k Split Data Sets (1,..,n) based on k mutually exclusive Feature values of Feature A i Research Methodology Manohar, Tucker
29 n- time stamped data sets No IM(Feature (i), Data Set (t)) i=i+1 Data set (t)=n No Predict IM(Feature (i)) Yes i=i+1 Split Data Sets 1,,n based on Max Predicted IM (Feature(1), Feature (k)) For Each Subset, P (Class 1) Yes End TREE, Classify Irrelevant Features Manohar, Tucker
30 n- time stamped data sets No IM(Feature (i), Data Set (t)) i=i+1 Data set (t)=n No Predict IM(Feature (i)) Yes i=i+1 Split Data Sets 1,,n based on Max Predicted IM (Feature(1), Feature (k)) For Each Subset, P (Class 1) Yes End TREE, Classify Irrelevant Features Manohar, Tucker
31 Data Mining Predictive Model Time t 1 Time t n Threat Results Manohar, Tucker
32 Proposed Methodology Research Methodology Manohar, Tucker
33 Step 4: Decision Support Early Warning System (EWS) is a graphical user interface (GUI) that display the percentage probability of threat/violent action being committed. Research Methodology Manohar, Tucker
34 APPLICATION CASE STUDY Case Study Manohar, Tucker
35 Possible Threat Scenario Case Study BBC UK (2008) Manohar, Tucker
36 CASE STUDY: TEST DATA Voluntary participants from the University community were invited to enact the threat and non-threat actions Recreated in an indoor space, similar to a high profile speech The data collected is then used to train the predictive models The study was approved by the IRB and the ORP at the Pennsylvania State University, University Park campus, under the title A Dynamic Pattern Recognition Framework for Mining and Predicting Emerging Threats and is filed as IRB # Study: 24 Subjects spanning 2 months Case Study Manohar, Tucker
37 THREAT PREDICTION RESULTS Low level threat prediction High level threat prediction Results Manohar, Tucker
38 RESULTS Confusion matrix for REPTree: Confusion matrix for Naive Bayes: Accuracy of Ensemble Methods: 86.8% Class FALSE TRUE Class FALSE TRUE FALSE FALSE TRUE TRUE Accuracy measures for REPTree: Accuracy measures for Naïve Bayes: Accuracy Precision Recall F- Measure PRC Area ROC Area Accuracy Precision Recall F- Measure PRC Area ROC Area 95.3% 96.9% 97.7% 97.3% 99.1% 96.9% 82.7% 87.3% 93.1% 90.1% 90.9% 71.8% Results Manohar, Tucker
39 CONCLUSION AND FUTURE WORK Conclusion and Future Work Manohar, Tucker
40 Conclusion and Future Work The most common surveillance systems today are reactive in nature and are not capable of actively predicting the emergence of a threat by analyzing past data collected. Privacy preserving data mining methodology This methodology takes the first step towards addressing these issues while providing promising results Expand the definition of threat Conclusion and Future Work Manohar, Tucker
41 Contributors: Dr. Conrad S. Tucker, D.A.T.A. Lab members, Research Participants from PSU. References: ACKNOWLEDGEMENTS AND REFERENCES 1. Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Joshi, Karuna Pande. "Analysis of data mining algorithms." University of Minnesota. Retrieved July 25 (1997): J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Third edition, Data-Driven Decision Tree Classification for Product Portfolio Design Optimization, Conrad S. Tucker and Harrison M. Kim, J. Comput. Inf. Sci. Eng. 9, (2009), DOI: / J. L. Raheja, A. Chaudhary, K. Singal, Tracking of fingertips and centers of palm using KINECT, International Conference on Computational Intelligence, Modeling & Simulation, 2011, Ya-Li Hou and Grantham K.H. Pang, Human detection in crowded scenes, IEEE international conference on image processing, 2010, References Manohar, Tucker
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationPart III: Multi-class ROC
Part III: Multi-class ROC The general problem multi-objective optimisation Pareto front convex hull Searching and approximating the ROC hypersurface multi-class AUC multi-class calibration 4 July, 2004
More informationIMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER
IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER N. Suresh Kumar, Dr. M. Thangamani 1 Assistant Professor, Sri Ramakrishna Engineering College, Coimbatore, India 2 Assistant
More informationClustering Analysis based on Data Mining Applications Xuedong Fan
Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based
More informationA Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu
More informationThe Pennsylvania State University. The Graduate School. Department of Industrial and Manufacturing Engineering
The Pennsylvania State University The Graduate School Department of Industrial and Manufacturing Engineering A PROPOSED DATA MINING DRIVEN METHDOLOGY FOR MODELING HUMAN GAIT AND GEOSPATIAL TRAJECTORIES
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationREMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationA Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay
A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay VPQM 2010, Scottsdale, Arizona, U.S.A, January 13-15 Outline Introduction Proposed approach Colors
More informationAn Empirical Study on feature selection for Data Classification
An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationAMOL MUKUND LONDHE, DR.CHELPA LINGAM
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationHuman Motion Detection in Manufacturing Process
Proceedings of the 2 nd World Congress on Electrical Engineering and Computer Systems and Science (EECSS'16) Budapest, Hungary August 16 17, 2016 Paper No. MVML 110 DOI: 10.11159/mvml16.110 Human Motion
More informationHuman Detection, Tracking and Activity Recognition from Video
Human Detection, Tracking and Activity Recognition from Video Mihir Patankar University of California San Diego Abstract - Human detection, tracking and activity recognition is an important area of research
More informationMachine Learning and Bioinformatics 機器學習與生物資訊學
Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It
More informationInternational Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17, ISSN
RULE BASED CLASSIFICATION FOR NETWORK INTRUSION DETECTION SYSTEM USING USNW-NB 15 DATASET Dr C Manju Assistant Professor, Department of Computer Science Kanchi Mamunivar center for Post Graduate Studies,
More informationPARALLEL CLASSIFICATION ALGORITHMS
PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationCombination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset
International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationClassification. Instructor: Wei Ding
Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationA method for depth-based hand tracing
A method for depth-based hand tracing Khoa Ha University of Maryland, College Park khoaha@umd.edu Abstract An algorithm for natural human-computer interaction via in-air drawing is detailed. We discuss
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationA Novel Feature Selection Framework for Automatic Web Page Classification
International Journal of Automation and Computing 9(4), August 2012, 442-448 DOI: 10.1007/s11633-012-0665-x A Novel Feature Selection Framework for Automatic Web Page Classification J. Alamelu Mangai 1
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationIncremental Learning Algorithm for Dynamic Data Streams
338 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.9, September 2008 Incremental Learning Algorithm for Dynamic Data Streams Venu Madhav Kuthadi, Professor,Vardhaman College
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationData Mining Download or Read Online ebook data mining in PDF Format From The Best User Guide Database
Free PDF ebook Download: Download or Read Online ebook data mining in PDF Format From The Best User Guide Database Vipin Kumar, Data mining course at University of Minnesota. Jiawei Han, slides of the
More informationMissing Value Imputation in Multi Attribute Data Set
Missing Value Imputation in Multi Attribute Data Set Minakshi Dr. Rajan Vohra Gimpy Department of computer science Head of Department of (CSE&I.T) Department of computer science PDMCE, Bahadurgarh, Haryana
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationHomework2 Chapter4 exersices Hongying Du
Homework2 Chapter4 exersices Hongying Du Note: use lg to denote log 2 in this whole file. 3. Consider the training examples shown in Table 4.8 for a binary classification problem. (a) The entropy of this
More informationData Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University
Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce
More informationClassification Using Decision Tree Approach towards Information Retrieval Keywords Techniques and a Data Mining Implementation Using WEKA Data Set
Volume 116 No. 22 2017, 19-29 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Classification Using Decision Tree Approach towards Information Retrieval
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationObject Extraction Using Image Segmentation and Adaptive Constraint Propagation
Object Extraction Using Image Segmentation and Adaptive Constraint Propagation 1 Rajeshwary Patel, 2 Swarndeep Saket 1 Student, 2 Assistant Professor 1 2 Department of Computer Engineering, 1 2 L. J. Institutes
More informationKeywords Binary Linked Object, Binary silhouette, Fingertip Detection, Hand Gesture Recognition, k-nn algorithm.
Volume 7, Issue 5, May 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Hand Gestures Recognition
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationFault Identification from Web Log Files by Pattern Discovery
ABSTRACT International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 2 ISSN : 2456-3307 Fault Identification from Web Log Files
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationDiscovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 10-15 www.iosrjen.org Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm P.Arun, M.Phil, Dr.A.Senthilkumar
More informationDimensionality Reduction, including by Feature Selection.
Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationIndex Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface
A Comparative Study of Classification Methods in Data Mining using RapidMiner Studio Vishnu Kumar Goyal Dept. of Computer Engineering Govt. R.C. Khaitan Polytechnic College, Jaipur, India vishnugoyal_jaipur@yahoo.co.in
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationGraph Matching: Fast Candidate Elimination Using Machine Learning Techniques
Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques M. Lazarescu 1,2, H. Bunke 1, and S. Venkatesh 2 1 Computer Science Department, University of Bern, Switzerland 2 School of
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationVisualization and text mining of patent and non-patent data
of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationShort Survey on Static Hand Gesture Recognition
Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of
More informationUbiquitous Computing and Communication Journal (ISSN )
A STRATEGY TO COMPROMISE HANDWRITTEN DOCUMENTS PROCESSING AND RETRIEVING USING ASSOCIATION RULES MINING Prof. Dr. Alaa H. AL-Hamami, Amman Arab University for Graduate Studies, Amman, Jordan, 2011. Alaa_hamami@yahoo.com
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More information13. Geospatio-temporal Data Analytics. Jacobs University Visualization and Computer Graphics Lab
13. Geospatio-temporal Data Analytics Recall: Twitter Data Analytics 573 Recall: Twitter Data Analytics 574 13.1 Time Series Data Analytics Introduction to Time Series Analysis A time-series is a set of
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationAdaptive Gesture Recognition System Integrating Multiple Inputs
Adaptive Gesture Recognition System Integrating Multiple Inputs Master Thesis - Colloquium Tobias Staron University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical Aspects
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationEmpirical Evaluation of Feature Subset Selection based on a Real-World Data Set
P. Perner and C. Apte, Empirical Evaluation of Feature Subset Selection Based on a Real World Data Set, In: D.A. Zighed, J. Komorowski, and J. Zytkow, Principles of Data Mining and Knowledge Discovery,
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationIntroduction to Automated Text Analysis. bit.ly/poir599
Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last
More informationThe Role of Biomedical Dataset in Classification
The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences
More informationData Mining Clustering
Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0
More informationBig Data Security Internal Threat Detection. The Critical Role of Machine Learning.
Big Data Security Internal Threat Detection The Critical Role of Machine Learning Objectives 1.Discuss internal user risk management challenges in Big Data Environment 2.Discuss why machine learning is
More informationEffect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction
International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationChange Detection in Remotely Sensed Images Based on Image Fusion and Fuzzy Clustering
International Journal of Electronics Engineering Research. ISSN 0975-6450 Volume 9, Number 1 (2017) pp. 141-150 Research India Publications http://www.ripublication.com Change Detection in Remotely Sensed
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN
International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 2321-3469 PERFORMANCE ANALYSIS OF CLASSIFICATION ALGORITHMS IN DATA MINING Srikanth Bethu
More informationIndoor Object Recognition of 3D Kinect Dataset with RNNs
Indoor Object Recognition of 3D Kinect Dataset with RNNs Thiraphat Charoensripongsa, Yue Chen, Brian Cheng 1. Introduction Recent work at Stanford in the area of scene understanding has involved using
More informationk-nearest Neighbor (knn) Sept Youn-Hee Han
k-nearest Neighbor (knn) Sept. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²Eager Learners Eager vs. Lazy Learning when given a set of training data, it will construct a generalization model before receiving
More informationScalable Object Classification using Range Images
Scalable Object Classification using Range Images Eunyoung Kim and Gerard Medioni Institute for Robotics and Intelligent Systems University of Southern California 1 What is a Range Image? Depth measurement
More informationPESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore
Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic
More informationImpact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data
Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data Jharna Chopra 1, Sampada Satav 2 M.E. Scholar, CTA, SSGI, Bhilai, Chhattisgarh, India 1 Asst.Prof, CSE, SSGI,
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationData Mining Technology Based on Bayesian Network Structure Applied in Learning
, pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationECLT 5810 Evaluation of Classification Quality
ECLT 5810 Evaluation of Classification Quality Reference: Data Mining Practical Machine Learning Tools and Techniques, by I. Witten, E. Frank, and M. Hall, Morgan Kaufmann Testing and Error Error rate:
More informationHybrid Fuzzy C-Means Clustering Technique for Gene Expression Data
Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data 1 P. Valarmathie, 2 Dr MV Srinath, 3 Dr T. Ravichandran, 4 K. Dinakaran 1 Dept. of Computer Science and Engineering, Dr. MGR University,
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationAn Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks
An Based on the Temporal-spatial Correlation in Wireless Sensor Networks 1 Department of Computer Science & Technology, Harbin Institute of Technology at Weihai,Weihai, 264209, China E-mail: Liuyang322@hit.edu.cn
More informationK-means clustering based filter feature selection on high dimensional data
International Journal of Advances in Intelligent Informatics ISSN: 2442-6571 Vol 2, No 1, March 2016, pp. 38-45 38 K-means clustering based filter feature selection on high dimensional data Dewi Pramudi
More information