Behavior Query Discovery in System-Generated Temporal Graphs
|
|
- Andrea Waters
- 5 years ago
- Views:
Transcription
1 Behavior Query Discovery in System-Generated Temporal Graphs Bo Zong,, Xusheng Xiao, Zhichun Li, Zhenyu Wu, Zhiyun Qian, Xifeng Yan, Ambuj K. Singh, Guofei Jiang UC Santa Barbara NEC Labs, America UC Riverside
2 Scaling Management in Complex Systems Computer security Power plants IoT Systems Large complex systems in daily life Impossible to manually monitor their states Have to resort to automation Big system monitor data Large volume: 40 GB/day for a medium-size cloud High heterogeneity: diverse data sources and data formats High dynamics: new data arriving over time Impossible to manually investigate such data Automated methods are desired, but how to fuse such monitor data for data analytics? [] Y. Zheng, H. Zhang, and Y. Yu. Detecting collective anomalies from multiple spatio-temporal datasets across different domains. SIGSPATIAL'5. [] W. R. Cheswick, S. M. Bellovin, and A. D. Rubin. Firewalls and Internet Security: Repelling the Wily Hacker. 00.
3 System Monitoring in Cybersecurity Monitored data: Syscall logs [] Each log line: Who does What at Which time Syscall log Computer system :0am: Process P created Process P :0am: P opened File F :am: P wrote to F :am: P closed F 9:pm: P created Process P 9:47pm: P opened Socket S 9:48pm: P was listening to S 0:04pm: P closed S :0pm: P created Process P 4 [] H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda. Panorama: capturing system-wide information flow for malware detection and analysis. CCS'07.
4 Fusing Monitor Data by Temporal Graphs Nodes: Basic system entities Edges: Interactions between nodes Timestamps: Time indicating when interactions happened Syscall log F : /bin/sh S : Unix socket :0pm: Process P was listening to S :0pm: P opened File F :pm: P wrote to File F :04pm: Process P closed F :5pm: P read File F :46pm: P read F 4 4:7pm: P read F 5 4:48pm: P read Pipe F : /bin/run :04pm P : /bin/run-parts Pipe :0pm P : /bin/dash F : /usr/bin/env :5pm F 4 : /bin/run-parts 4:7pm F 5 : /etc/update-motd.d On temporal graph data, user may issue security queries, e.g., any suspicious activities in HR department servers? 4
5 Analyzing Systems through Behavior Queries Security questions are composed of system behaviors [] System behaviors: high-level events, formed by a set of system entity interactions Example: Security query from user Someone remotely accessed HR s desktop by ssh, compressed some files, and then sent the data to a remote server sshd-login Compressing files A chain of behavior queries Sending data to a remote server If we can formulate behavior queries, then we can answer security queries. But how? [] U. Bayer, I. Habibi, D. Balzarotti, E. Kirda, and C. Kruegel. A view on current malware behaviors. LEET'09. 5
6 Query Formulation: Handcrafted vs. Automated Handcrafting is infeasible Users have to know which set of basic entity interaction and their execution order Users have to repeat this process for hundreds of behaviors on different operating systems Why not use raw logs? Low recall: Raw logs include too much noise generated from other behaviors Low query efficiency: Raw logs can include thousands of interactions sshd-login Server Raw logs (Syscall logs) generated by executing sshd-login Behavior query sshd-login 6
7 Discriminative Patterns as Behavior Queries Behavior queries for sshd-login Pipe t t P: /bin/uname t P: /bin/dash t 4 P: /bin/uname Messy raw logs 00+ nodes,,000+ edges Poor usability F: /bin/uname Discriminative pattern 5 nodes, 4 edges High precision and recall 7
8 Query Discovery by Discriminative Patterns Positive set: raw logs obtained by executing target behavior multiple times Negative set: raw logs obtained by executing other behaviors multiple times Discriminative patterns: frequently appear in positive set, but rarely appear in negative set sshd-login Server Positive set Discriminative patterns Behavior query Other behaviors Negative set sshd-login Our focus: Enabling fast discriminative temporal graph pattern mining 8
9 Outline Introduction Mining Discriminative Temporal Graph Patterns TGMiner: Enabling Fast Pattern Mining Experimental Results Conclusion 9
10 Temporal Graph and Subgraph Temporal graph G = (V, E, T ) V, labeled nodes T, timestamps E V V T, edges with timestamps Temporal subgraph: Let G = (V, E, T ). G' t G, if There are two injective functions f : V ' V and τ : T ' T u' V ', u and f (u ) have same labels (node mapping) (u', v',t') E ', ( f (u), f (v),τ(t)) E (edge mapping) (u ', v ',t '), (u ', v ',t ') E ', sign(t ' t ') = sign(τ(t ') τ(t ')) 4 B 7 A B G' t G A B Time order preserved C 7 E 4 D E D G G 0
11 Temporal Graph Pattern Temporal graph pattern g = (V, E, T ) t T, t E, relative time order (u, v,t), (u', v',t') E, t t ', edges are totally ordered T-connected: (u, v,t) E, all the edges of timestamps smaller than t form a connected graph Examples 4 B A A A A 4 B B 4 B B 5 C No relative time order Not totally ordered 7 E G 4 D E g D C E G 6 D Not T-connected G and G are temporal graphs, but not patterns g is temporal graph pattern
12 Problem Statement Input Positive graph set G p Negative graph set G n Output: Discriminative patterns g that maximize F( freq(g p, g), freq(g n, g)) freq(g, g) = {G g t G G G} G Number of Graphs in G containing g F(x, y), discriminative function that rewards high x but penalizes high y, such as F(x, y) = log(x / (y +ε)), ε =0 6 [] Prefer patterns appearing frequently in G p, rarely in G n TGMiner: Fast discriminative pattern mining for temporal graphs [] N. Jin, C. Young, and W. Wang. GAIA: Graph Classification using Evolutionary Computation. SIGMOD 0
13 Outline Introduction Mining Discriminative Temporal Graph Patterns TGMiner: Enabling Fast Pattern Mining Experimental Results Conclusion
14 TGMiner Overview Key components Pattern growth: Guide search in pattern space Pruning: Skip unpromising search branches Prune Technical challenges Pattern growth: how to avoid repeated search and achieve full coverage with small overhead Pruning: how to discover unpromising search branches with small overhead 4
15 Pattern Growth: Temporal Graph Isomorphism Temporal graph isomorphism: Static graph isomorphism G' = t G G' t G G t G' Temporal graph isomorphism Combinatorial #growth paths [,] Costly graph isomorphism check Costly pattern growth algorithm Implication: Avoid repeated search with small overhead Unique growth path: Growing with increasing temporal order One can decide temporal graph isomorphism in linear time [] X. Yan and J. Han. gspan: Graph-based substructure pattern mining. ICDM 0 [] X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. SIGMOD'08. 5
16 Pattern Growth: Algorithms Principle : Growing patterns in increasing time 4 Principle : Three ways to keep T-connected Forward Backward Inward Result: Following Principle and, one can perform pattern growth with full coverage 6
17 Skip Unpromising Branches: Subgraph Pruning Intuition g g has been discovered. If g t g, can we prune branches of g? g The sub-tree under g has already been explored. When g t g, we can prune g if the following conditions hold Node labels in g \ g are never used in g s growth There is only one node mapping between g and g The edge of largest timestamp in g can only match the edge of largest timestamp in g These conditions can be checked in linear time 7
18 Skip Unpromising Branches: Supergraph Pruning Intuition... g has been discovered. If g t g, can we prune branches of g? g g The sub-tree under g has already been explored. 4 4 When g t g, we can prune g if the following conditions hold g is not a descendent of g, and they have same number of nodes There is only one node mapping between g and g g and g have same number of matches in G p and G n These conditions can be checked in linear time 4 8
19 Outline Introduction Mining Discriminative Temporal Graph Patterns TGMiner: Enabling Fast Pattern Mining Experimental Results Conclusion 9
20 Experiment Setup Temporal graphs generated from real-life system behaviors Small Medium Large Behavior Avg. #nodes Avg. #edges Total #labels #graphs bzip-decompress 5 00 gzip-decompress wget-download ftp-download scp-download gcc-compile g++-compile ftpd-login ssh-login sshd-login apt-get-update apt-get-install background Positive Negative 0
21 Behavior Query Accuracy Behavior queries can be formulated by Keyword set, such as process name, file name, etc. (Keyword) Discriminative non-temporal graph pattern (Non-Temporal) Discriminative temporal graph pattern (Temporal, our method) Precision Recall Keyword Non-Temporal Temporal Keyword Non-Temporal Temporal bzip 00% 00% 00% 00% 00 00% wget-download 96.5% 00% 00% 9.6% 9.4% 9.4% scp-download.8% 59.4% 00%.% 9.% 9.% g++-compile 7.4% 9.% 95.% 84.5% 85.% 85.% ftpd-login 76.6% 8.8% 94.% 00% 89.7% 86.8% sshd-login 4.4% 59.6% 99.9% 99.8% 99.9% 99.9% apt-get-install 68.% 8.7% 95.7% 5.6% 86.% 8.9% Average 68.5% 8.% 97.4% 78.4% 9.9% 9.% Remark: Discriminative temporal graph patterns significantly outperform baseline methods on precision, with comparable recall.
22 Response Time Baseline algorithms SubPrune: Only use subgraph pruning SupPrune: Only use supergraph pruning PruneVF/PruneGI/LinearScan: Subgraph checking by VF [] /Graph indexing []/LinearScan Response time (second) TGMiner PruneGI SubPrune LinearScan PruneVF SupPrune Response time (second) TGMiner PruneGI SubPrune LinearScan PruneVF Response time (second) TGMiner PruneGI SubPrune LinearScan PruneVF 0 Mining Algorithms 0 Mining Algorithms 0 Mining Algorithms Small behaviors Medium behaviors Large behaviors Remark: TGMiner is up to 0 times faster than all the baselines across all cases [] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub) graph isomorphism algorithm for matching large graphs. TPAMI'04. [] B. Zong, R. Raghavendra, M. Srivatsa, X. Yan, A. K. Singh, and K.-W. Lee. Cloud service placement via subgraph matching. ICDE'4.
23 Outline Introduction Mining Discriminative Temporal Graph Patterns TGMiner: Enabling Fast Pattern Mining Experimental Results Conclusion
24 Conclusion Behavior queries are critical in system analytics Impractical to manually form such queries Our idea: Automating query formulation by discriminative temporal graph pattern mining TGMiner: Enabling fast pattern mining Utilizing intrinsic properties in temporal graphs Fast pattern space exploration Effective pattern space pruning TGMiner is up to 0 times faster than baselines Behavior queries formulated by discovered patterns achieve high precision (97%) and recall (9%) 4
25 Acknowledgement This research is supported by NEC Labs, America Army Research Lab under cooperative agreement W9NF (NS-CTA) NSF IIS-954 and IIS
26 Behavior Query Discovery in System-Generated Temporal Graphs Speaker: Ambuj K. Singh Bo Zong,, Xusheng Xiao, Zhichun Li, Zhenyu Wu, Zhiyun Qian, Xifeng Yan, Ambuj K. Singh, Guofei Jiang UC Santa Barbara NEC Labs, America UC Riverside 6
27 Handling Concurrent Edges in Patterns Concurrent edges: edges of same timestamps in same patterns gspan[] TGMiner + gspan A sequence of concurrent subgraphs Adding edge of larger timestamp: TGMiner growth Adding edge of same timestamp: gspan growth Larger overhead on pruning TGMiner Static pattern Temporal pattern with concurrent edges Temporal pattern Pattern Space [] X. Yan and J. Han. gspan: Graph-based substructure pattern mining. ICDM 0 7
arxiv: v2 [cs.si] 19 Nov 2015
Behavior Query Discovery in System-Generated Temporal Graphs Bo Zong Xusheng Xiao Zhichun Li Zhenyu Wu Zhiyun Qian 3 Xifeng Yan mbuj K. Singh Guofei Jiang NEC Labs merica, Inc. UC Santa Barbara 3 UC Riverside
More informationData Mining in Bioinformatics Day 3: Graph Mining
Graph Mining and Graph Kernels Data Mining in Bioinformatics Day 3: Graph Mining Karsten Borgwardt & Chloé-Agathe Azencott February 6 to February 17, 2012 Machine Learning and Computational Biology Research
More informationData Mining in Bioinformatics Day 5: Graph Mining
Data Mining in Bioinformatics Day 5: Graph Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen from Borgwardt and Yan, KDD 2008 tutorial Graph Mining and Graph Kernels,
More informationSurvey on Graph Query Processing on Graph Database. Presented by FAN Zhe
Survey on Graph Query Processing on Graph Database Presented by FA Zhe utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background
More informationMining frequent Closed Graph Pattern
Mining frequent Closed Graph Pattern Seminar aus maschninellem Lernen Referent: Yingting Fan 5.November Fachbereich 21 Institut Knowledge Engineering Prof. Fürnkranz 1 Outline Motivation and introduction
More informationPattern Mining in Frequent Dynamic Subgraphs
Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de
More informationManaging and Mining Graph Data
Managing and Mining Graph Data by Charu C. Aggarwal IBM T.J. Watson Research Center Hawthorne, NY, USA Haixun Wang Microsoft Research Asia Beijing, China
More informationData Mining in Bioinformatics Day 5: Frequent Subgraph Mining
Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes
More informationIntuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs
Intuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs Nandish Jayaram University of Texas at Arlington PhD Advisors: Dr. Chengkai Li, Dr. Ramez
More informationSub-Graph Finding Information over Nebula Networks
ISSN (e): 2250 3005 Volume, 05 Issue, 10 October 2015 International Journal of Computational Engineering Research (IJCER) Sub-Graph Finding Information over Nebula Networks K.Eswara Rao $1, A.NagaBhushana
More informationMining Significant Graph Patterns by Leap Search
Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC) Graphs Are Everywhere Magwene et al. Genome Biology 2004 5:R100 Co-expression
More informationQuery-Based Outlier Detection in Heterogeneous Information Networks
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 Uniersity of Illinois at Urbana-Champaign 2 Uniersity of
More informationAC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery
: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,
More informationgspan: Graph-Based Substructure Pattern Mining
University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction
More informationExtraction of Frequent Subgraph from Graph Database
Extraction of Frequent Subgraph from Graph Database Sakshi S. Mandke, Sheetal S. Sonawane Deparment of Computer Engineering Pune Institute of Computer Engineering, Pune, India. sakshi.mandke@cumminscollege.in;
More informationMining Query-Based Subnetwork Outliers in Heterogeneous Information Networks
Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks Honglei Zhuang, Jing Zhang 2, George Brova, Jie Tang 2, Hasan Cam 3, Xifeng Yan 4, Jiawei Han University of Illinois at Urbana-Champaign
More informationANALYSIS OF DENSE AND SPARSE PATTERNS TO IMPROVE MINING EFFICIENCY
ANALYSIS OF DENSE AND SPARSE PATTERNS TO IMPROVE MINING EFFICIENCY A. Veeramuthu Department of Information Technology, Sathyabama University, Chennai India E-Mail: aveeramuthu@gmail.com ABSTRACT Generally,
More informationSearching complex graphs
Searching complex graphs complex graph data Big volume: huge number of nodes/links Big variety: complex, heterogeneous schema Big velocity: e.g., frequently updated Noisy, ambiguous attributes and values
More informationTowards New Heterogeneous Data Stream Clustering based on Density
, pp.30-35 http://dx.doi.org/10.14257/astl.2015.83.07 Towards New Heterogeneous Data Stream Clustering based on Density Chen Jin-yin, He Hui-hao Zhejiang University of Technology, Hangzhou,310000 chenjinyin@zjut.edu.cn
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationMining Frequent Itemsets for data streams over Weighted Sliding Windows
Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology
More informationEGDIM - Evolving Graph Database Indexing Method
EGDIM - Evolving Graph Database Indexing Method Shariful Islam Department of Computer Science and Engineering University of Dhaka, Bangladesh tulip.du@gmail.com Chowdhury Farhan Ahmed Department of Computer
More informationAn Efficient Algorithm for finding high utility itemsets from online sell
An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,
More informationInferring Protocol State Machine from Network Traces: A Probabilistic Approach
Inferring Protocol State Machine from Network Traces: A Probabilistic Approach Yipeng Wang, Zhibin Zhang, Danfeng(Daphne) Yao, Buyun Qu, Li Guo Institute of Computing Technology, CAS Virginia Tech, USA
More informationDistributed Exact Subgraph Matching in Small Diameter Dynamic Graphs
Distributed Exact Subgraph Matching in Small Diameter Dynamic Graphs Charith Wickramaarachchi, Rajgopal Kannan Charalampos Chelmis*, and Viktor K. Prasanna University of Southern California *University
More informationTGNet: Learning to Rank Nodes in Temporal Graphs. Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2
TGNet: Learning to Rank Nodes in Temporal Graphs Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2 1 2 3 Node Ranking in temporal graphs Temporal graphs have been
More informationDetect tracking behavior among trajectory data
Detect tracking behavior among trajectory data Jianqiu Xu, Jiangang Zhou Nanjing University of Aeronautics and Astronautics, China, jianqiu@nuaa.edu.cn, jiangangzhou@nuaa.edu.cn Abstract. Due to the continuing
More informationComparative Survey of Query Processing on Graph Databases
Comparative Survey of Query Processing on Graph Databases Project Report for COP5725: Spring 2013 Group name: Sunsteeds (Sharanya Jayaraman, Srinath Viswanathan) April 25, 2013 Abstract Graph Databases
More informationMARGIN: Maximal Frequent Subgraph Mining Λ
MARGIN: Maximal Frequent Subgraph Mining Λ Lini T Thomas Satyanarayana R Valluri Kamalakar Karlapalem enter For Data Engineering, IIIT, Hyderabad flini,satyag@research.iiit.ac.in, kamal@iiit.ac.in Abstract
More informationAN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE
AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE Changwu Zhu 1, Guanxiang Yan 2, Zhi Liu 3, Li Gao 1,* 1 Department of Computer Science, Hua Zhong Normal University, Wuhan 430079, China
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More informationA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India
More informationFP-Growth algorithm in Data Compression frequent patterns
FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : nagesh.v@gmail.com Abstract-The transmission
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationMaintenance of the Prelarge Trees for Record Deletion
12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of
More informationOPTIMIZING ACCESS ACROSS HIERARCHIES IN DATA WAREHOUSES QUERY REWRITING ICS 624 FINAL PROJECT MAY 成玉 Cheng, Yu. Supervisor: Lipyeow Lim, PhD
OPTIMIZING ACCESS ACROSS HIERARCHIES IN DATA WAREHOUSES QUERY REWRITING ICS 624 FINAL PROJECT MAY 2011 By 成玉 Cheng, Yu Supervisor: Lipyeow Lim, PhD Contents 1 Introduction 3 1.1 Data Warehousing...........................................
More informationA Novel Algorithm for Associative Classification
A Novel Algorithm for Associative Classification Gourab Kundu 1, Sirajum Munir 1, Md. Faizul Bari 1, Md. Monirul Islam 1, and K. Murase 2 1 Department of Computer Science and Engineering Bangladesh University
More informationWeb page recommendation using a stochastic process model
Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,
More informationWIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity
WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity Unil Yun and John J. Leggett Department of Computer Science Texas A&M University College Station, Texas 7783, USA
More informationUtility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets
Utility Mining: An Enhanced UP Growth Algorithm for Finding Maximal High Utility Itemsets C. Sivamathi 1, Dr. S. Vijayarani 2 1 Ph.D Research Scholar, 2 Assistant Professor, Department of CSE, Bharathiar
More informationDS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li
Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationSubgraph Isomorphism. Artem Maksov, Yong Li, Reese Butler 03/04/2015
Subgraph Isomorphism Artem Maksov, Yong Li, Reese Butler 03/04/2015 Similar Graphs The two graphs below look different but are structurally the same. Definition What is Graph Isomorphism? An isomorphism
More informationFrequent Pattern-Growth Approach for Document Organization
Frequent Pattern-Growth Approach for Document Organization Monika Akbar Department of Computer Science Virginia Tech, Blacksburg, VA 246, USA. amonika@cs.vt.edu Rafal A. Angryk Department of Computer Science
More informationIMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING
IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING 1 SONALI SONKUSARE, 2 JAYESH SURANA 1,2 Information Technology, R.G.P.V., Bhopal Shri Vaishnav Institute
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationMining Frequent Patterns without Candidate Generation
Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview
More informationFrequent Itemsets Melange
Frequent Itemsets Melange Sebastien Siva Data Mining Motivation and objectives Finding all frequent itemsets in a dataset using the traditional Apriori approach is too computationally expensive for datasets
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationMining High Average-Utility Itemsets
Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering
More informationCODENSE v
CODENSE v1.0 ----------------- INTRODUCTION Given a relation graph dataset, D={G 1,G 2, G n }, where G i =(V,E i ), Definition 1 (Support) The support of a graph g is the number of graphs (in D) where
More informationMining Recent Frequent Itemsets in Data Streams with Optimistic Pruning
Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning Kun Li 1,2, Yongyan Wang 1, Manzoor Elahi 1,2, Xin Li 3, and Hongan Wang 1 1 Institute of Software, Chinese Academy of Sciences,
More informationChallenges and Interesting Research Directions in Associative Classification
Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Email: FFayez@philadelphia.edu.jo
More informationShengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota
Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)
More informationDENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE
DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering
More informationAn Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database
Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management
More informationSecurity analytics: From data to action Visual and analytical approaches to detecting modern adversaries
Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Chris Calvert, CISSP, CISM Director of Solutions Innovation Copyright 2013 Hewlett-Packard Development
More informationMining Temporal Association Rules in Network Traffic Data
Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering
More informationEFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS
EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS K. Kavitha 1, Dr.E. Ramaraj 2 1 Assistant Professor, Department of Computer Science,
More informationTopic Diversity Method for Image Re-Ranking
Topic Diversity Method for Image Re-Ranking D.Ashwini 1, P.Jerlin Jeba 2, D.Vanitha 3 M.E, P.Veeralakshmi M.E., Ph.D 4 1,2 Student, 3 Assistant Professor, 4 Associate Professor 1,2,3,4 Department of Information
More informationGraph-based Learning. Larry Holder Computer Science and Engineering University of Texas at Arlington
Graph-based Learning Larry Holder Computer Science and Engineering University of Texas at Arlingt 1 Graph-based Learning Multi-relatial data mining and learning SUBDUE graph-based relatial learner Discovery
More informationResolving Security s Biggest Productivity Killer
cybereason Resolving Security s Biggest Productivity Killer How Automated Detection Reduces Alert Fatigue and Cuts Response Time 2016 Cybereason. All rights reserved. 1 In today s security environment,
More informationPerformance Analysis of Frequent Closed Itemset Mining: PEPP Scalability over CHARM, CLOSET+ and BIDE
Volume 3, No. 1, Jan-Feb 2012 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Performance Analysis of Frequent Closed
More informationCategorization of Sequential Data using Associative Classifiers
Categorization of Sequential Data using Associative Classifiers Mrs. R. Meenakshi, MCA., MPhil., Research Scholar, Mrs. J.S. Subhashini, MCA., M.Phil., Assistant Professor, Department of Computer Science,
More informationMining Top K Large Structural Patterns in a Massive Network
Mining Top K Large Structural Patterns in a Massive Network Feida Zhu Singapore Management University fdzhu@smu.edu.sg Xifeng Yan University of California at Santa Barbara xyan@cs.ucsb.edu Qiang Qu Peking
More informationFREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING
FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING Neha V. Sonparote, Professor Vijay B. More. Neha V. Sonparote, Dept. of computer Engineering, MET s Institute of Engineering Nashik, Maharashtra,
More informationHolistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of
More informationAn Efficient RFID Data Cleaning Method Based on Wavelet Density Estimation
An Efficient RFID Data Cleaning Method Based on Wavelet Density Estimation Yaozong LIU 1*, Hong ZHANG 1, Fawang HAN 2, Jun TAN 3 1 School of Computer Science and Engineering Nanjing University of Science
More informationWireless Sensor Architecture GENERAL PRINCIPLES AND ARCHITECTURES FOR PUTTING SENSOR NODES TOGETHER TO
Wireless Sensor Architecture 1 GENERAL PRINCIPLES AND ARCHITECTURES FOR PUTTING SENSOR NODES TOGETHER TO FORM A MEANINGFUL NETWORK Mobile ad hoc networks Nodes talking to each other Nodes talking to some
More informationDual Active Feature and Sample Selection for Graph Classification
Dual Active Feature and Sample Selection for Graph Classification Xiangnan Kong University of Illinois at Chicago Chicago, IL, USA xkong4@uic.edu Wei Fan IBM T. J. Watson Research Hawthorn, NY, USA weifan@us.ibm.com
More informationLecture 10. Dynamic Analysis
Lecture 10. Dynamic Analysis Wei Le Thank Xiangyu Zhang, Michael Ernst, Tom Ball for Some of the Slides Iowa State University 2014.11 Outline What is dynamic analysis? Instrumentation Analysis Representing
More informationCLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets
CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta
More informationMetadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online
Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Yingjie Hu 1, Krzysztof Janowicz 1, Sathya Prasad 2, and Song Gao 1 1 STKO Lab, Department
More informationAssociation Rules Mining using BOINC based Enterprise Desktop Grid
Association Rules Mining using BOINC based Enterprise Desktop Grid Evgeny Ivashko and Alexander Golovin Institute of Applied Mathematical Research, Karelian Research Centre of Russian Academy of Sciences,
More informationIJMIE Volume 2, Issue 3 ISSN:
Deep web Data Integration Approach Based on Schema and Attributes Extraction of Query Interfaces Mr. Gopalkrushna Patel* Anand Singh Rajawat** Mr. Satyendra Vyas*** Abstract: The deep web is becoming a
More informationComprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority To cite this article:
More informationMobility Data Management and Exploration: Theory and Practice
Mobility Data Management and Exploration: Theory and Practice Chapter 4 -Mobility data management at the physical level Nikos Pelekis & Yannis Theodoridis InfoLab, University of Piraeus, Greece infolab.cs.unipi.gr
More informationDependency-Preserving Data Compaction for Scalable Forensic Analysis 1
Intro Reductions Optimizations Evaluation Summary Dependency-Preserving Data Compaction for Scalable Forensic Analysis 1 Md Nahid Hossain, Junao Wang, R. Sekar, and Scott D. Stoller 1 This work was supported
More informationDetect Cyber Threats with Securonix Proxy Traffic Analyzer
Detect Cyber Threats with Securonix Proxy Traffic Analyzer Introduction Many organizations encounter an extremely high volume of proxy data on a daily basis. The volume of proxy data can range from 100
More informationMining Minimal Contrast Subgraph Patterns
Mining Minimal Contrast Subgraph Patterns Roger Ming Hieng Ting James Bailey Abstract In this paper, we introduce a new type of contrast pattern, the minimal contrast subgraph. It is able to capture structural
More informationPrivacy Challenges in Big Data and Industry 4.0
Privacy Challenges in Big Data and Industry 4.0 Jiannong Cao Internet & Mobile Computing Lab Department of Computing Hong Kong Polytechnic University Email: csjcao@comp.polyu.edu.hk http://www.comp.polyu.edu.hk/~csjcao/
More informationPositive and Unlabeled Learning for Graph Classification
Positive and Unlabeled Learning for Graph Classification Yuchen Zhao Department of Computer Science University of Illinois at Chicago Chicago, IL Email: yzhao@cs.uic.edu Xiangnan Kong Department of Computer
More informationAutomation of URL Discovery and Flattering Mechanism in Live Forum Threads
Automation of URL Discovery and Flattering Mechanism in Live Forum Threads T.Nagajothi 1, M.S.Thanabal 2 PG Student, Department of CSE, P.S.N.A College of Engineering and Technology, Tamilnadu, India 1
More informationDiscovering Paths Traversed by Visitors in Web Server Access Logs
Discovering Paths Traversed by Visitors in Web Server Access Logs Alper Tugay Mızrak Department of Computer Engineering Bilkent University 06533 Ankara, TURKEY E-mail: mizrak@cs.bilkent.edu.tr Abstract
More informationDesign and Implementation of Music Recommendation System Based on Hadoop
Design and Implementation of Music Recommendation System Based on Hadoop Zhao Yufeng School of Computer Science and Engineering Xi'an University of Technology Shaanxi, Xi an, China e-mail: zyfzy99@163.com
More informationData Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems
Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationResearch and Design of Data Storage Scheme for Electric Power Big Data
3rd International Conference on Management, Education, Information and Control (MEICI 2015) Research and Design of Data Storage Scheme for Electric Power Big Data Wenfeng Song 1,a, Wanqing Yang 2,b*, Jingzhao
More informationAn Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment
An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment Navneet Goyal, Poonam Goyal, K Venkatramaiah, Deepak P C, and Sanoop P S Department of Computer Science & Information
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationStackMine Performance Debugging in the Large via Mining Millions of Stack Traces
StackMine Performance Debugging in the Large via Mining Millions of Stack Traces Shi Han 1, Yingnong Dang 1, Song Ge 1, Dongmei Zhang 1, and Tao Xie 2 Software Analytics Group, Microsoft Research Asia
More informationMining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports
Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,
More informationAvailable online at ScienceDirect. Procedia Computer Science 45 (2015 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 45 (2015 ) 101 110 International Conference on Advanced Computing Technologies and Applications (ICACTA- 2015) An optimized
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationHOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery
HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,
More informationEfficient Mining of Generalized Negative Association Rules
2010 IEEE International Conference on Granular Computing Efficient Mining of Generalized egative Association Rules Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang Dept. of Information Engineering and Computer
More informationMULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ)
MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ) Application of Machine Learning to Testing in Finance, Cyber, and Software Innovation center, Washington, D.C. THE SCIENCE OF TEST WORKSHOP 2017 AGENDA
More informationMining Interesting Itemsets in Graph Datasets
Mining Interesting Itemsets in Graph Datasets Boris Cule Bart Goethals Tayena Hendrickx Department of Mathematics and Computer Science University of Antwerp firstname.lastname@ua.ac.be Abstract. Traditionally,
More information