Presented by: Dardan Xhymshiti Spring 2016
|
|
- Nelson French
- 5 years ago
- Views:
Transcription
1 Presented by: Dardan Xhymshiti Spring 2016
2 Authors: Sean Chester* Darius Sidlauskas` Ira Assent* Kenneth S. Bogh* *Data-Intensive Systems Group, Aarhus University, Denmakr `Data-Intensive Applications and Systems Laboratory, EPFL, Switzerland Publication: Type: ICDE 2015 Research Paper 2
3 Skyline is expensive to compute especially in large datasets. The recent multi-core skyline algorithms does not effectively reduce the dominance tests. State-of-the art skyline algorithms outperform multi-core algorithms. Most of the multi-core Skyline algorithms use the Divide-&-Conquer approach which has two drawbacks: If the number of local skyline points is large, the merging step is expensive. Increasing the cardinality of the dataset, the computation becomes expensive. 3
4 To come up with a new multi-core algorithm, which eliminate as much as it can dominance tests. 4
5 Provide an overview about skyline operator. Introduces to the innovative skyline Hybrid algorithm. Provide experiments which shows that this algorithm outperforms multi-core and sequential algorithms. 5
6 How to increase the performance of skyline algorithms: Implementation in GPUs Implementation in Multi-Core CPUs. Implementing in distributing environments like MapReduce. The authors have developed an algorithm called: Hybrid The authors have chosen the Multi-Core CPU to do the implementation of the algorithm because of: Cheaper shared data structures and Parallel work need not be isolated. 6
7 All the skyline points are maintained in a shared global data structure. This data structure gets updated regularly and is read by all threads. The skyline points in the data structure are ordered to detect dominance relationships quickly. The processing of points is done in blocks that guarantee each point is compared to at most α points than in a sequential algorithm. 7
8 Sort-based algorithms (quickly detect dominance relationships) SFS (Sort Filter Skyline) LESS SaLSa (Sort and Limit Skyline algorithm) Object-based space partitioning (quickly detect incomparability) Object-Space partitioning BSky-Tree-P 8
9 Presented by: Dardan Xhymshiti Spring 2016
10 Authors: Michael Shekelyam Gregor Josse Matthias Schubert Institute of Informatics, Ludwig-Maximilians-University Munich Publication: Type: ICDE 2015 Research Paper 10
11 In many application areas, data is organized as a network of graph. Important task: compute a cost-optimal path between a start node and a target one. Example: Road networks (Cost criteria: travel time, travel distance, energy consumption etc.) Computer networks (Cost criteria: bandwidth and the latency between routers.) Cost vector: when considering more than one criterion at a time, the cost of complete path is called cost vector. Cost criteria 1 Cost criteria 2 Cost criteria 3 11
12 How to define if a path is an optimal path? 1. Map the cost vector to a value by employing a monotonic combination function, and then sort the paths. The top n paths are the optimal ones. Problems: a) Hard to find a suitable function, b) Different types of cost might have different levels of scale. 2. Compute the pareto optimal (mathematical definition of Skyline) cost vectors. (This is also known as conventional path skyline). Problems: a) The number of pareto optimal paths might increase exponentially as function of distance and the amount of considered cost criteria. b) Showing to the user a large amount of results is not very helpful. 12
13 There actually exist solutions for computing linear path skylines, but they are restricted to the specific case of having just two cost criteria. Problem: cannot be generalized to more criteria. The number of skyline paths increases exponentially with the distance between the locations and the number of cost criteria. Thus the result set might be too big. 13
14 Come up with a new approach of computing the results set of path skylines, which provides better and faster results. 14
15 Recall: Conventional path skyline computes all potential optimal paths, but the result is too big. Idea: reduce the result set, to only show the paths which are optimal under a weighted sum function or linear combination of cost criteria. Intuitively saying: The user weights each type of cost with a percentage describing its importance. How to compute the linear path skyline? Naive approach: compute the conventional path skyline and then compute the convex hull on the resulting cost vectors. (Inefficient). 15
16 What is Convex Hull? Definition: In mathematics, the convex hull or convex envelope of a set X of points in the Euclidean plane or Euclidean space is the smallest convex set that contains X. 16
17 The authors come up with an algorithm called LSCH (Linear Skyline Convex Hull) which constructs a linear path skyline successively while only adding new paths which are members of the result set. Implementation overview: 1. To add a new cost vector, a single search is performed which combine the cost vectors based on the normal vectors of the hyper planes currently limiting the linear path skyline. 2. To identify the areas on the linear skyline where additional results might still exits, the algorithms applies multidimensional convex hull computation. 17
18 Experiments are run in two different types of networks: 1. Munich road network with five cost criteria. 2. Artificial lattice graphs that allow to simulate different problem instances and parameter settings. 18
19 Computing route Skylines algorithms. Convex hull algorithm. 19
20 Presented by: Shahab Helmi Spring 2016
21 Authors: Publication: Type: ICDE 2015 Research Paper 21
22 A recent study suggests that the routes provided by a leading navigation service often fail to agree with the routes chosen by local drivers, why? Limited number of travel costs, e.g., distance or travel time. With the rapid development and continuing use of vehicle tracking technologies, it is possible to learn and update individual drivers driving preferences according to their trajectories. 22
23 It proposes a novel problem on personalized route recommendation based on big trajectory data. It proposes techniques to model and update driving preferences from drivers trajectories. The proposed driving preference model can support arbitrary number of travel costs of interest and distributions of cost ratios. Comprehensive experiments were done conducted on a substantial, real trajectory data set to show efficiency and effectiveness. 23
24 1. Indexing the road network. 2. Modeling drivers preferences from their trajectories considering multiple travel costs. 3. Selecting sub-trajectories according to source, destination, departure time and driver s preferences. 4. constructing a small graph (Zohreh) with appropriate edge weights reflecting how the driver would like to use the edges based on the selected trajectories. 5. Returning the shortest route in the small graph as the personalized route to the driver. 24
25 Route Planning. Route Planning Using Trajectories: no driver modeling: Most popular route (MPR) Time period-based most popular route (TPMPR) Top-k popular routes (TKPR) Personalized Route Planning: TRIP: closest work, tested over a smaller dataset, can only model travel time 25
26 GPS records: 52,211 taxis in Beijing. during to One GPS record is collected in every 5 seconds or less. Road network: 6 th street in Beijing. 28,342 vertices and 38,577 edges. 60 km60 km square region. 26
27 Trajectories: 32,379,248 trajectories. starts when a passenger got in the taxi and ends when the passenger got off the taxi. Travel Costs: Travel distance. Travel time. Fuel consumption. 27
28 Presented by: Shahab Helmi Spring 2016
29 Authors: Publication: Type: ICDE 2015 Research Paper 29
30 an episode (serial episode) is usually defined as a totally ordered set of events that occur relatively close to each other. of an episode: how frequently it occurs in a sequence. Frequency count methods: Window-based Non-overlapped occurrences Non-interleaved occurrences Total Frequency can capture the most intense correlation between events. 30
31 Previous studies on frequent episode mining (FEM) mostly process the data in offline mode. Shortcomings: This process may take hours or days. Testing whether an episode occurs in a sequence is an NP-complete problem. Real world application challenges: Fast-growing data. Recency effect: only freshets pattern from recent events are of interest (high frequency trading a trader holds stock only for 22 seconds in average). Time-critical analysis: a delay may lead to drastic loss (predictive maintenance of data centers). 31
32 Online FEM algorithm challenges: 1. Infrequent events at the current moment could become frequent in the future they cannot be discarded. 2. Then a compact and effective data structure is required to store all episode occurrences. 3. Mining all minimal occurrences of episode also becomes a big challenge over the growing sequence. For #1 a data structure, named episode trie, is proposed. It s a prefix tree similar to on we have in FP-Growth. For #3, an algorithm, named mining frequent serial episode via last occurrence (MESELO), is proposed. Claimed to be the first online FEM algorithm. The last occurrence concept is defined for the first time: when was the last time in which an episode occurred using the last occurrence information, MESELO can generate new minimal frequent episodes after receiving new data. Complexity analysis: O(M W ), it s claimed that in most real world applications W is small, so it works well. 32
33 There is no previous work of online FEM. Frequent episode mining on event sequences (main difference is the frequency count method): Alarm sequences in telecom networks Web navigation logs Timestamped fault reports in car manufacturing plants Sales transactions and stock data News Breath first approaches (Apriori-based) Depth first approaches (prefix tree) 33
34 Online frequent itemset mining (approximate and exact methods): MOMENT Can-Tree SWIM FP-Stream FDPM They does not apply on episode mining since in frequent itemset mining the time information is not important while episodes are ordered according to their occurrence time. Hence, keeping the tree up-to-date is hard. Using the last occurrence concept it becomes more efficient. 34
35 Server 1 (for algorithm): Intel Xeon E GHz processor 32 GB memory Windows server 2008 Server 2 (for MySQL database): Intel Xeon E GHz processor 16 GB memory Linux 35
36 36
37 1. Online mode: MESELO- BS is similar to MESELO but does not use the concept of last episode occurrence. MinSupprt 2. Offline mode: the performance of MESELO is compare to baseline algorithms. X axis shows window size MinSupport is 10 Window size 37
Record Placement Based on Data Skew Using Solid State Drives
BPOE-5 Record Placement Based on Data Skew Using Solid State Drives Jun Suzuki 1,2, Shivaram Venkataraman 2, Sameer Agarwal 2, Michael Franklin 2, and Ion Stoica 2 1 Green Platform Research Laboratories,
More informationThis paper proposes: Mining Frequent Patterns without Candidate Generation
Mining Frequent Patterns without Candidate Generation a paper by Jiawei Han, Jian Pei and Yiwen Yin School of Computing Science Simon Fraser University Presented by Maria Cutumisu Department of Computing
More informationAssociation Rule Mining: FP-Growth
Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong We have already learned the Apriori algorithm for association rule mining. In this lecture, we will discuss a faster
More informationDS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview
More informationDatabases and Data-Intensive Systems. Computer Science Day May 23 st 2014 Aarhus University!
Databases and Data-Intensive Systems Computer Science Day May 23 st 2014 Aarhus University!! Staff! Ira Assent, associate professor! Christian S. Jensen, professor! (part-time)! Bin Yang, postdoc! Chenjuan
More informationConstrained Skyline Query Processing against Distributed Data Sites
Constrained Skyline Query Processing against Distributed Data Divya.G* 1, V.Ranjith Naik *2 1,2 Department of Computer Science Engineering Swarnandhra College of Engg & Tech, Narsapuram-534280, A.P., India.
More informationPincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set Dao-I Lin Telcordia Technologies, Inc. Zvi M. Kedem New York University July 15, 1999 Abstract Discovering frequent itemsets
More informationAssociation Rule Mining
Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}
More informationA New Online Clustering Approach for Data in Arbitrary Shaped Clusters
A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK
More informationFast BVH Construction on GPUs
Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationTheorem 2.9: nearest addition algorithm
There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used
More informationMining Frequent Itemsets in Time-Varying Data Streams
Mining Frequent Itemsets in Time-Varying Data Streams Abstract A transactional data stream is an unbounded sequence of transactions continuously generated, usually at a high rate. Mining frequent itemsets
More informationAppropriate Item Partition for Improving the Mining Performance
Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationNesnelerin İnternetinde Veri Analizi
Bölüm 4. Frequent Patterns in Data Streams w3.gazi.edu.tr/~suatozdemir What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationInterestingness Measurements
Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected
More informationAn Algorithm for Mining Large Sequences in Databases
149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential
More informationRoadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.
15-721 DB Sys. Design & Impl. Association Rules Christos Faloutsos www.cs.cmu.edu/~christos Roadmap 1) Roots: System R and Ingres... 7) Data Analysis - data mining datacubes and OLAP classifiers association
More informationUC Riverside UC Riverside Previously Published Works
UC Riverside UC Riverside Previously Published Works Title Massively parallel skyline computation for processing-in-memory architectures Permalink https://escholarship.org/uc/item/9c57d0f4 ISBN 9781450359863
More informationgspan: Graph-Based Substructure Pattern Mining
University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationCLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets
CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta
More informationJ. Parallel Distrib. Comput.
J. Parallel Distrib. Comput. 71 (011) 30 315 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Top-k vectorial aggregation queries in
More informationTowards Personalized, Context-Aware Routing
Noname manuscript No. (will be inserted by the editor) Towards Personalized, Context-Aware Routing Bin Yang Chenjuan Guo Yu Ma Christian S. Jensen Received: date / Accepted: date Abstract A driver s choice
More informationAdaptive Parallel Compressed Event Matching
Adaptive Parallel Compressed Event Matching Mohammad Sadoghi 1,2 Hans-Arno Jacobsen 2 1 IBM T.J. Watson Research Center 2 Middleware Systems Research Group, University of Toronto April 2014 Mohammad Sadoghi
More informationScalable Selective Traffic Congestion Notification
Scalable Selective Traffic Congestion Notification Győző Gidófalvi Division of Geoinformatics Deptartment of Urban Planning and Environment KTH Royal Institution of Technology, Sweden gyozo@kth.se Outline
More informationHYPER METHOD BY USE ADVANCE MINING ASSOCIATION RULES ALGORITHM
HYPER METHOD BY USE ADVANCE MINING ASSOCIATION RULES ALGORITHM Media Noaman Solagh 1 and Dr.Enas Mohammed Hussien 2 1,2 Computer Science Dept. Education Col., Al-Mustansiriyah Uni. Baghdad, Iraq Abstract-The
More informationSkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures
The VLDB Journal DOI 10.1007/s00778-016-0438-1 REGULAR PAPER SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures Kenneth S. Bøgh 1 Sean Chester 2 Ira Assent 1 Received:
More informationConsistency vs. Availability in a Partitioned System. Choice may depend on context: Bank Balance (consistency) vs Facebook Likes (availability) 31
Consistency vs. Availability in a Partitioned System Choice may depend on context: Bank Balance (consistency) vs Facebook Likes (availability) 31 NoSQL NoSQL Data Stores Flexible with consistency vs availability
More informationMining Distributed Frequent Itemset with Hadoop
Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario
More informationCS490D: Introduction to Data Mining Prof. Chris Clifton
CS490D: Introduction to Data Mining Prof. Chris Clifton April 5, 2004 Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing
More informationInformed/Heuristic Search
Informed/Heuristic Search Outline Limitations of uninformed search methods Informed (or heuristic) search uses problem-specific heuristics to improve efficiency Best-first A* Techniques for generating
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:
More informationTo Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set
To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationChapter 7: Frequent Itemsets and Association Rules
Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line
More informationParallelizing TCP/IP Offline Log Analysis and Processing Exploiting Multiprocessor Functionality
Parallelizing TCP/IP Offline Log Analysis and Processing Exploiting Multiprocessor Functionality Chirag Kharwar Department Of Computer Science & Engineering Nirma university Abstract In the era of internet
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationCHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.
119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationGPGPU: Parallel Reduction and Scan
Administrivia GPGPU: Parallel Reduction and Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 3 due Wednesday 11:59pm on Blackboard Assignment 4 handed out Monday, 02/14 Final Wednesday
More informationCSCI6405 Project - Association rules mining
CSCI6405 Project - Association rules mining Xuehai Wang xwang@ca.dalc.ca B00182688 Xiaobo Chen xiaobo@ca.dal.ca B00123238 December 7, 2003 Chen Shen cshen@cs.dal.ca B00188996 Contents 1 Introduction: 2
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationOn Smart Query Routing: For Distributed Graph Querying with Decoupled Storage
On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft
More informationFP-Growth algorithm in Data Compression frequent patterns
FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : nagesh.v@gmail.com Abstract-The transmission
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More information6. Parallel Volume Rendering Algorithms
6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationDS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li
Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationICS 161 Algorithms Winter 1998 Final Exam. 1: out of 15. 2: out of 15. 3: out of 20. 4: out of 15. 5: out of 20. 6: out of 15.
ICS 161 Algorithms Winter 1998 Final Exam Name: ID: 1: out of 15 2: out of 15 3: out of 20 4: out of 15 5: out of 20 6: out of 15 total: out of 100 1. Solve the following recurrences. (Just give the solutions;
More informationTutorial on Association Rule Mining
Tutorial on Association Rule Mining Yang Yang yang.yang@itee.uq.edu.au DKE Group, 78-625 August 13, 2010 Outline 1 Quick Review 2 Apriori Algorithm 3 FP-Growth Algorithm 4 Mining Flickr and Tag Recommendation
More informationImproved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *
2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5
More informationCollaboration with: Dieter Pfoser, Computer Technology Institute, Athens, Greece Peter Wagner, German Aerospace Center, Berlin, Germany
Towards traffic-aware aware a routing using GPS vehicle trajectories Carola Wenk University of Texas at San Antonio carola@cs.utsa.edu Collaboration with: Dieter Pfoser, Computer Technology Institute,
More informationRecord Placement Based on Data Skew Using Solid State Drives
Record Placement Based on Data Skew Using Solid State Drives Jun Suzuki 1, Shivaram Venkataraman 2, Sameer Agarwal 2, Michael Franklin 2, and Ion Stoica 2 1 Green Platform Research Laboratories, NEC j-suzuki@ax.jp.nec.com
More informationConstructing Popular Routes from Uncertain Trajectories
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei, Yu Zheng, Wen-Chih Peng presented by Slawek Goryczka Scenarios A trajectory is a sequence of data points recording location information
More informationCHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM
CHAPTER 4 K-MEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering
More informationThe Adaptive Radix Tree
Department of Informatics, University of Zürich MSc Basismodul The Adaptive Radix Tree Rafael Kallis Matrikelnummer: -708-887 Email: rk@rafaelkallis.com September 8, 08 supervised by Prof. Dr. Michael
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationData Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems
Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,
More informationEpilog: Further Topics
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas
More informationData Filtering Using Reverse Dominance Relation in Reverse Skyline Query
Data Filtering Using Reverse Dominance Relation in Reverse Skyline Query Jongwan Kim Smith Liberal Arts College, Sahmyook University, 815 Hwarang-ro, Nowon-gu, Seoul, 01795, Korea. ORCID: 0000-0003-4716-8380
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationReinforcement Learning for Adaptive Routing of Autonomous Vehicles in Congested Networks
Reinforcement Learning for Adaptive Routing of Autonomous Vehicles in Congested Networks Jonathan Cox Aeronautics & Astronautics Brandon Jennings Mechanical Engineering Steven Krukowski Aeronautics & Astronautics
More informationParallel Similarity Join with Data Partitioning for Prefix Filtering
22 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.9, NO.1 May 2015 Parallel Similarity Join with Data Partitioning for Prefix Filtering Jaruloj Chongstitvatana 1 and Methus Bhirakit 2, Non-members
More informationProxy Server Systems Improvement Using Frequent Itemset Pattern-Based Techniques
Proceedings of the 2nd International Conference on Intelligent Systems and Image Processing 2014 Proxy Systems Improvement Using Frequent Itemset Pattern-Based Techniques Saranyoo Butkote *, Jiratta Phuboon-op,
More informationCOLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA
COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010
More informationHammer Slide: Work- and CPU-efficient Streaming Window Aggregation
Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)
More informationFM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data
FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,
More informationLOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS
LOG FILE ANALYSIS USING HADOOP AND ITS ECOSYSTEMS Vandita Jain 1, Prof. Tripti Saxena 2, Dr. Vineet Richhariya 3 1 M.Tech(CSE)*,LNCT, Bhopal(M.P.)(India) 2 Prof. Dept. of CSE, LNCT, Bhopal(M.P.)(India)
More informationLearning to Route with Sparse Trajectory Sets
Learning to Route with Sparse Trajectory Sets Chenjuan Guo, Bin Yang, Jilin Hu, Christian S. Jensen Department of Computer Science, Aalborg University, Denmark {cguo, byang, hujilin, csj}@cs.aau.dk Abstract
More informationEfficient Lists Intersection by CPU- GPU Cooperative Computing
Efficient Lists Intersection by CPU- GPU Cooperative Computing Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, Jing Liu Nankai-Baidu Joint Lab, Nankai University Outline Introduction Cooperative
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationResearch and Application of E-Commerce Recommendation System Based on Association Rules Algorithm
Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationName: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.
1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G. A shortest s-t path is a path from vertex to vertex, whose sum of edge weights is minimized. (b) Give the pseudocode
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationData Centric Computing
Piyush Chaudhary HPC Solutions Development Data Centric Computing SPXXL/SCICOMP Summer 2011 Agenda What is Data Centric Computing? What is Driving Data Centric Computing? Puzzle vs.
More informationMining User Steps An innovative Approach to faster Crash Resolution
Mining User Steps An innovative Approach to faster Crash Resolution Tanvi Dharmarha, Quality Engineering Manager Banani Ghosh, Software Engineer Rupak Chakraborty, Member of Technical Staff Adobe Systems
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationInternational Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Improved Apriori Algorithm was Applied in the System of Elective Courses in Colleges and Universities
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationMarket baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.
Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights
More informationDynamic Skyline Queries in Metric Spaces
Dynamic Skyline Queries in Metric Spaces Lei Chen and Xiang Lian Department of Computer Science and Engineering Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China
More informationCut-And-Sew: A Distributed Autonomous Localization Algorithm for 3D Surface Sensor Networks
Cut-And-Sew: A Distributed Autonomous Localization Algorithm for 3D Surface Sensor Networks Yao Zhao, Hongyi Wu, Miao Jin, Yang Yang, Hongyu Zhou, and Su Xia Presenter: Hongyi Wu The Center for Advanced
More informationAN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang
International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA
More informationLinear Regression Optimization
Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:
More informationCOMPLETE AND SCALABLE MULTI-ROBOT PLANNING IN TUNNEL ENVIRONMENTS. Mike Peasgood John McPhee Christopher Clark
COMPLETE AND SCALABLE MULTI-ROBOT PLANNING IN TUNNEL ENVIRONMENTS Mike Peasgood John McPhee Christopher Clark Lab for Intelligent and Autonomous Robotics, Department of Mechanical Engineering, University
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationA Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition
A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.
More informationQuantifying FTK 3.0 Performance with Respect to Hardware Selection
Quantifying FTK 3.0 Performance with Respect to Hardware Selection Background A wide variety of hardware platforms and associated individual component choices exist that can be utilized by the Forensic
More informationGPU Computation Strategies & Tricks. Ian Buck NVIDIA
GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit
More informationCHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS
CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one
More information