Adaptive Parallel Compressed Event Matching

Size: px
Start display at page:

Download "Adaptive Parallel Compressed Event Matching"

Transcription

1 Adaptive Parallel Compressed Event Matching Mohammad Sadoghi 1,2 Hans-Arno Jacobsen 2 1 IBM T.J. Watson Research Center 2 Middleware Systems Research Group, University of Toronto April 2014 Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

2 1 Event Matching 2 BE-Tree (Boolean Expression-Tree) Background 3 Parallel BE-Tree 4 Experimental Analysis 5 Conclusions & Future Work Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

3 Computational Advertising (A Billion-dollar Industry) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

4 Computational Advertising (A Billion-dollar Industry) Advertisement Example: (age < 32) wt=0.2 (credit-score > 630) wt=0.6 (num-visits > 4) wt=0.1 (price = 150) wt=0.1 Advertiser Sears Sony Amazon Advertising Campaigns Advertiser Subscriptions (modeled as Boolean Expressions) Indexing Kernel Scaling to millions of subscriptions (queries) over hundreds of dimensions Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

5 Computational Advertising (A Billion-dollar Industry) Advertisement Example: (age < 32) wt=0.2 (credit-score > 630) wt=0.6 (num-visits > 4) wt=0.1 (price = 150) wt=0.1 Advertiser Sears Sony Amazon Advertising Campaigns Advertiser Subscriptions (modeled as Boolean Expressions) Indexing Kernel (Num-visits=13) wt=0.5 (age=25) wt=0.1 (price<235) wt=0.5 (credit-score=647) wt=0.2 User Profiles Online Users Clickstream car=bmw BMW X model=x3 year=2008 Events Events Supporting up to millions of events per second Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

6 Computational Advertising (A Billion-dollar Industry) Advertisement Example: (age < 32) wt=0.2 (credit-score > 630) wt=0.6 (num-visits > 4) wt=0.1 (price = 150) wt=0.1 Advertiser Sears Sony Amazon Advertising Campaigns Advertiser Subscriptions (modeled as Boolean Expressions) (Num-visits=13) wt=0.5 (age=25) wt=0.1 (credit-score=647) wt=0.2 (price<235) wt=0.5 Events User Profiles Indexing Kernel Online Users Ads (Relevant) Clickstream Ads car=bmw BMW X model=x3 year=2008 Events Rocket Fuel processes 19 billion bid requests a day and each ad is served in 100ms Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

7 Application Scenarios 1 Push-based query processing (data analytics) 2 Computational advertising (targeted advertising) 3 Computational finance (algorithmic trading) 4 Approximate string matching (data quality and data cleaning) 5 Intrusion detection (deep packet inspection) 6 Declarative data-centric workflows (business process management) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

8 Application Scenarios 1 Push-based query processing (data analytics) 2 Computational advertising (targeted advertising) 3 Computational finance (algorithmic trading) 4 Approximate string matching (data quality and data cleaning) 5 Intrusion detection (deep packet inspection) 6 Declarative data-centric workflows (business process management) Problem Statement To continuously evaluate a set of patterns/specifications (subscriptions) over incoming event stream. Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

9 Matching Problem Challenges 1 Handle subscription with a high degree of overlap 2 Scale to millions of subscriptions over thousands of dimensions 3 Sustain high-matching rates in presence of frequent changes of subscriptions 4 Adapt to skewed workload distributions (self-adjusting mechanism) 5 Retrieve only the most relevant subscriptions for given a event 6 Exploit the parallelism and minimize iterations over the matching structure 7 Enable matching over re-ordered and compressed event stream Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

10 1 Event Matching 2 BE-Tree (Boolean Expression-Tree) Background 3 Parallel BE-Tree 4 Experimental Analysis 5 Conclusions & Future Work Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

11 Language and Data Model Subscriptions/Events are defined as Boolean expressions (conjunctions of predicates) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

12 Language and Data Model Subscriptions/Events are defined as Boolean expressions (conjunctions of predicates) A predicate P (attr,opt,val,wt) (x) is a quadruple consisting of an attribute (in an n-dimensional attribute space), an operator, a range of values, and a weight Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

13 Language and Data Model Subscriptions/Events are defined as Boolean expressions (conjunctions of predicates) A predicate P (attr,opt,val,wt) (x) is a quadruple consisting of an attribute (in an n-dimensional attribute space), an operator, a range of values, and a weight A predicate P(x) either accepts or rejects an input x such that P : x {True, False}, where x Dom(P attr ) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

14 Language and Data Model Subscriptions/Events are defined as Boolean expressions (conjunctions of predicates) A predicate P (attr,opt,val,wt) (x) is a quadruple consisting of an attribute (in an n-dimensional attribute space), an operator, a range of values, and a weight A predicate P(x) either accepts or rejects an input x such that P : x {True, False}, where x Dom(P attr ) Each predicate supports relational operators (<,, =,,, >), set operators (, / ), or the SQL BETWEEN operator Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

15 Language and Data Model Subscriptions/Events are defined as Boolean expressions (conjunctions of predicates) A predicate P (attr,opt,val,wt) (x) is a quadruple consisting of an attribute (in an n-dimensional attribute space), an operator, a range of values, and a weight A predicate P(x) either accepts or rejects an input x such that P : x {True, False}, where x Dom(P attr ) Each predicate supports relational operators (<,, =,,, >), set operators (, / ), or the SQL BETWEEN operator Boolean Expression P attr,opt,val,wt 1 (x) P attr,opt,val,wt k (x), k n; i, j k, Pi attr = Pj attr iff i = j Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

16 Matching Semantics Stabbing Subscription Given an event ɛ and a set of subscriptions Σ, find all subscriptions σ i Σ that are satisfied by ɛ. Definition SQ(ɛ) = {σ i Pq attr,opt,val,wt (x) σ i, Po attr,opt,val,wt (x) ɛ, Pq attr = Po attr, x Dom(Pq attr ), P q (x) P o (x)}. Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

17 Design Principles Most Important Design Feature Systematically explore the space in two iterative phases of space partitioning and space clustering. The two-phased space-cutting technique consists of 1 space partitioning: global structuring to determine the best splitting dimension(s) 2 space clustering: local structuring for each partition to determine the best grouping of expressions w.r.t. chosen dimension(s) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

18 Intuition Behind the Two-phase Space-cutting Technique SUBSCRIPTION SPACE SUBSCRIPTIONS Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

19 Intuition Behind the Two-phase Space-cutting Technique SPACE PARTITIONING X-AXIS Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

20 Intuition Behind the Two-phase Space-cutting Technique X-AXIS SPACE CLUSTERING Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

21 Intuition Behind the Two-phase Space-cutting Technique SPACE PARTITIONING Y-AXIS Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

22 Intuition Behind the Two-phase Space-cutting Technique SPACE PARTITIONING SPACE CLUSTERING Y-AXIS Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

23 BE-Tree Core Design (Two-phase Space-cutting) c l k = number of predicates per subscriptions N = domain cardinality O(1) p-directory Partitioning p p O(klogN) O(logN) c-directory c-directory Clustering c c c c l p-directory l l p partition-node p p c l cluster-node leaf-node To systematically explore the space using the two-phases space-cutting technique 1 to cope with the curse of dimensionality 2 to support dynamic changes of subscriptions Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

24 1 Event Matching 2 BE-Tree (Boolean Expression-Tree) Background 3 Parallel BE-Tree 4 Experimental Analysis 5 Conclusions & Future Work Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

25 BE-Tree Bitmap-based Encoded Matching Predicate-based Event Encoding Concise and cache-conscious encoding of events (data) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

26 BE-Tree Bitmap-based Encoded Matching Predicate-based Event Encoding Bitmap-based Event Encoding Concise and cache-conscious encoding of events (data) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

27 BE-Tree Bitmap-based Encoded Matching Predicate-based Event Encoding Bitmap-based Event Encoding BE-Tree Concise and cache-conscious encoding of events (data) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

28 BE-Tree Bitmap-based Encoded Matching Predicate-based Event Encoding Bitmap-based Event Encoding Match Results Subscription ID BE-Tree Concise and cache-conscious encoding of events (data) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

29 Predicate Evaluation Through Bitmap-based Encoding p-directory c l p p Compressed Events Result Bit-array (bitmap-based event encoding) c-directory c c c-directory c c l 2-dimensional subscription's representation l p l p-node c c-node S 1 S m l l-node Concise and cache-conscious encoding of events (data) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

30 Parallel Compressed Matching over BE-Tree Stage 1 Event Stream Event 1 Event i Event m Traversing BE-Tree and scanning leaf pages in parallel exactly once for m events Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

31 Parallel Compressed Matching over BE-Tree Event 1 Stage 1 Event Stream Stage 2 Bitmap-based Event Encoding Thread Event i Thread i Event m Thread m Traversing BE-Tree and scanning leaf pages in parallel exactly once for m events Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

32 Parallel Compressed Matching over BE-Tree Event 1 Stage 1 Event Stream Stage 2 Bitmap-based Event Encoding Thread Stage 3 Bitwise-OR of Bitmap-based Encodings (Compressing Event Stream) Event i Thread i Thread 1 Thread m Event m Thread m Traversing BE-Tree and scanning leaf pages in parallel exactly once for m events Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

33 Parallel Compressed Matching over BE-Tree Event 1 Stage 1 Event Stream Stage 2 Bitmap-based Event Encoding Thread 1 Stage 3 Bitwise-OR of Bitmap-based Encodings (Compressing Event Stream) Stage 4 Subscription ID Event Matching (Parallel Tree Traversal) Event i Thread i Thread 1 Thread m BE-Tree Event m Thread m Thread 1 Thread i Thread m Traversing BE-Tree and scanning leaf pages in parallel exactly once for m events Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

34 Parallel Compressed Matching over BE-Tree Stage 5 Event 1 Event i Stage 1 Event Stream Stage 2 Bitmap-based Event Encoding Thread Stage 4 Subscription ID Event Matching (Parallel Tree Traversal) Thread i Bitwise-OR of Bitmap-based Encodings (Compressing Event Stream) Thread 1 Stage 3 Thread m BE-Tree i th Event Event Matching (Parallel Leaf Scanning) Thread 1 Match Results Thread i Match Results Event m Thread m Thread 1 Thread i Thread m Thread m Match Results Traversing BE-Tree and scanning leaf pages in parallel exactly once for m events Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

35 Reordering Events (Adaptive Parallel Matching) Incoming Events Event 1 Event 2 Event 3 Event 4 Event b-4 Event b-3 Event b-2 Event b-1 Event b Efficient online re-ordering of event stream Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

36 Reordering Events (Adaptive Parallel Matching) Incoming Events Event Re-ordering (Inserting Events into BE-Tree) Event 1 Event 2 Event 3 Event 4 Event b-4 Event b-3 Event b-2 BE-Tree (events) Event b-1 Event b Efficient online re-ordering of event stream Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

37 Reordering Events (Adaptive Parallel Matching) Incoming Events Event Re-ordering (Inserting Events into BE-Tree) Reordered Events Event 1 Event 2 Event 3 Event 4 Event b-4 Event b-3 Event b-2 BE-Tree (events) Event b-1 Event b Efficient online re-ordering of event stream Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

38 Reordering Events (Adaptive Parallel Matching) Incoming Events Event Re-ordering (Inserting Events into BE-Tree) Reordered Events Adaptive Processing Event 1 Event 2 Compressed Event 3 Event 4 Event b-4 Event b-3 Event b-2 Event b-1 Event b BE-Tree (events) Compressed Uncompressed Efficient online re-ordering of event stream Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

39 Reordering Events (Adaptive Parallel Matching) Incoming Events Event Re-ordering (Inserting Events into BE-Tree) Reordered Events Adaptive Processing Event Matching Event 1 Event 2 Event 3 Compressed BE-Tree (subscriptions) Event 4 Event b-4 BE-Tree (events) Compressed BE-Tree (subscriptions) Event b-3 Event b-2 Event b-1 Event b Uncompressed BE-Tree (subscriptions) Efficient online re-ordering of event stream Controlling the bucket size, reasoning about the bucket heterogeneity, hybrid matching approach Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

40 1 Event Matching 2 BE-Tree (Boolean Expression-Tree) Background 3 Parallel BE-Tree 4 Experimental Analysis 5 Conclusions & Future Work Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

41 Experimental Evaluation Algorithms 1 BE: BE-Tree (Sadoghi, Jacobsen. SIGMOD 11) 2 Bitmap: BE-Tree (with bitmap) (Sadoghi, Jacobsen. TODS 13) 3 A-PCM: Adaptive Parallel Compressed Matching 4 GR: IBM Gryphon (Aguilera et al., PODC 99) 5 P: Propagation Algorithm (Fabret et al. SIGMOD 01) 6 k-ind: k-index (Whang et al. VLDB 09) 7 SIFT: Counting Algorithm (Yan et al. TODS 94) 8 SCAN: Sequential Scan Hardware 1 2 Intel Quad-core Xeon CPU 3.00GHz, 16GB main memory Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

42 Workload Configurations Workload Size Number of Dimensions Match Prob Stream Similarity Distinct Predicates Match Prob (DBLP Data) Size 1M-5M 5M 5M 5M 5M 5M Number of Dim Cardinality Number of Sub Pred Number of Event Pred Pred Avg. Range Size % % Equality Pred Match Prob % ( 0) or 1 ( 0) or Stream Similarity % BEGen Our comprehensive Boolean expression workload generator: Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

43 Effect of Workload Size on Matching (Log Scale) Matching Time/Event (ms) M 900K 700K 500K 300K 100K BE-B BE GR P k-ind SIFT SCAN Matching Time/Event (ms) M 900K 700K 500K 300K 100K BE-B BE GR P k-ind SIFT SCAN Varying Number of Subscriptions (a) Uniform: Workload Size Varying Number of Subscriptions (b) Zipf: Workload Size Figure: Varying Workload Size (Match Prob = 1%) Improving matching latency by orders of magnitude through our two-phase space-cutting technique Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

44 Effect of Parallel Matching (Log Scale) Avg. Throughput/Second BE-Tree Bitmap Parallel A-PCM Avg. Throughput/Second BE-Tree Bitmap Parallel A-PCM Varying Overlap Probablity; Sub=5M (a) Matching Probability m = 1% Varying Overlap Probablity; Sub=5M (b) Matching Probability 0% Figure: Varying % of Stream Similarity Significantly improving matching throughput through our parallel matching over compressed streams Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

45 Parallel Compressed Matching Time Breakdown Datasets Stream Bitmap Encoding Tree Leaf Re-ordering & Compression Traversal Scanning Unif 2.57% 0.91% 32.08% 63.62% Zipf 0.36% 0.18% 28.61% 70.64% Author 2.18% 0.76% 29.53% 66.97% Title 1.04% 0.22% 23.20% 75.41% Table: Matching time breakdown (%) Overhead of stream re-ordering, parallel bitmap encoding, and compression is negligible Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

46 Effect of Parallel Matching (Log Scale) Cache-misses (Percentage) BE-Tree Bitmap Parallel A-PCM Seq A-PCM Varying Overlap Probablity; Sub=5M (a) Matching Probability m = 1% Avg. Matching Time (ms) Bitmap Varying Event Delay (ms); Sub=5M (b) Matching Probability 0% Figure: Percentage of Cache-misses & Latency Comparison Substantially reducing both cache-misses and improving latency for high-throughput event rate Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

47 1 Event Matching 2 BE-Tree (Boolean Expression-Tree) Background 3 Parallel BE-Tree 4 Experimental Analysis 5 Conclusions & Future Work Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

48 Conclusions Event matching is at the heart of event processing engines (e.g., computational advertising) Key contributions are 1 A novel parallel compressed event matching algorithm over a bitmap-based encoding 2 An efficient online stream re-ordering technique 3 An adaptive algorithm that depending on stream similarity selectively compresses similar events Future work: Moving towards heterogeneous computational model (e.g., FPGAs, GPUs, and co-processors) Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

49 Questions? Thank you! Mohammad Sadoghi (IBM T.J. Watson) Parallel Matching April / 27

BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High-dimensional Space. University of Toronto

BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High-dimensional Space. University of Toronto BE-Tree: An Index Structure to Efficiently Match Boolean Expressions over High-dimensional Space Mohammad Sadoghi Hans-Arno Jacobsen University of Toronto June 15, 2011 Mohammad Sadoghi (University of

More information

Multi-Query Stream Processing on FPGAs. University of Toronto

Multi-Query Stream Processing on FPGAs. University of Toronto Multi-Query Stream Processing on FPGAs Mohammad Sadoghi Rija Javed Naif Tarafdar Harsh Singh Rohan Palaniappan Hans-Arno Jacobsen April 2012 Algorithmic Trading NASDAQ NYSE TSX AMGN=58 HON=24 Market ORCL=12

More information

Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading. University of Toronto

Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading. University of Toronto Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading Martin Labrecque Harsh Singh Warren Shum Hans-Arno Jacobsen University of Toronto Algorithm Trading Examples of Financial

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Jignesh M. Patel. Blog:

Jignesh M. Patel. Blog: Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Fast BVH Construction on GPUs

Fast BVH Construction on GPUs Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California

More information

Extending In-Memory Relational Database Engines with Native Graph Support

Extending In-Memory Relational Database Engines with Native Graph Support Extending In-Memory Relational Database Engines with Native Graph Support EDBT 18 Mohamed S. Hassan 1 Tatiana Kuznetsova 1 Hyun Chai Jeong 1 Walid G. Aref 1 Mohammad Sadoghi 2 1 Purdue University West

More information

Hash Joins for Multi-core CPUs. Benjamin Wagner

Hash Joins for Multi-core CPUs. Benjamin Wagner Hash Joins for Multi-core CPUs Benjamin Wagner Joins fundamental operator in query processing variety of different algorithms many papers publishing different results main question: is tuning to modern

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

Column-Oriented Database Systems. Liliya Rudko University of Helsinki Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research

More information

7. Query Processing and Optimization

7. Query Processing and Optimization 7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one

More information

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses

More information

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras High-Performance Holistic XML Twig Filtering Using GPUs Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras Outline! Motivation! XML filtering in the literature! Software approaches! Hardware

More information

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)

More information

Efficient Bulk Deletes for Multi Dimensional Clustered Tables in DB2

Efficient Bulk Deletes for Multi Dimensional Clustered Tables in DB2 Efficient Bulk Deletes for Multi Dimensional Clustered Tables in DB2 Bishwaranjan Bhattacharjee, Timothy Malkemus IBM T.J. Watson Research Center Sherman Lau, Sean McKeough, Jo-anne Kirton Robin Von Boeschoten,

More information

Bitmap Indices for Fast End-User Physics Analysis in ROOT

Bitmap Indices for Fast End-User Physics Analysis in ROOT Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger a, Kesheng Wu a, Rene Brun b, Philippe Canal c a Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA b European Organization

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline

More information

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016 Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List

More information

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from

More information

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for

More information

The Power of Batching in the Click Modular Router

The Power of Batching in the Click Modular Router The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering

More information

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 HICAMP Bitmap A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 Database Indexing Databases use precomputed indexes

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

BiCEP Benchmarking Complex Event Processing Systems

BiCEP Benchmarking Complex Event Processing Systems BiCEP Benchmarking Complex Event Processing Systems Pedro Bizarro University of Coimbra, DEI-CISUC 3030-290 Coimbra, Portugal bizarro@dei.uc.pt Abstract. BiCEP is a new project being started at the University

More information

Accelerating Spark Workloads using GPUs

Accelerating Spark Workloads using GPUs Accelerating Spark Workloads using GPUs Rajesh Bordawekar, Minsik Cho, Wei Tan, Benjamin Herta, Vladimir Zolotov, Alexei Lvov, Liana Fong, and David Kung IBM T. J. Watson Research Center 1 Outline Spark

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Using Graphics Processors for High Performance IR Query Processing

Using Graphics Processors for High Performance IR Query Processing Using Graphics Processors for High Performance IR Query Processing Shuai Ding Jinru He Hao Yan Torsten Suel Polytechnic Inst. of NYU Polytechnic Inst. of NYU Polytechnic Inst. of NYU Yahoo! Research Brooklyn,

More information

CSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018

CSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018 CSE 4/521 Introduction to Operating Systems Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018 Overview Objective: To discuss how the disk is managed for a

More information

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH 2019 SNOWFLAKE Our vision Allow our customers to access all their data in one place so they can make actionable decisions anytime, anywhere, with any number

More information

Splotch: High Performance Visualization using MPI, OpenMP and CUDA

Splotch: High Performance Visualization using MPI, OpenMP and CUDA Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Albis: High-Performance File Format for Big Data Systems

Albis: High-Performance File Format for Big Data Systems Albis: High-Performance File Format for Big Data Systems Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference

More information

P4 Pub/Sub. Practical Publish-Subscribe in the Forwarding Plane

P4 Pub/Sub. Practical Publish-Subscribe in the Forwarding Plane P4 Pub/Sub Practical Publish-Subscribe in the Forwarding Plane Outline Address-oriented routing Publish/subscribe How to do pub/sub in the network Implementation status Outlook Subscribers Publish/Subscribe

More information

Data Transformation and Migration in Polystores

Data Transformation and Migration in Polystores Data Transformation and Migration in Polystores Adam Dziedzic, Aaron Elmore & Michael Stonebraker September 15th, 2016 Agenda Data Migration for Polystores: What & Why? How? Acceleration of physical data

More information

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Visual Analysis of Lagrangian Particle Data from Combustion Simulations Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang

More information

Simulation of Scale-Free Networks

Simulation of Scale-Free Networks Simulation of Scale-Free Networks Gabriele D Angelo http://www.cs.unibo.it/gdangelo/ it/ / joint work with: Stefano Ferretti Department of Computer Science University of Bologna SIMUTOOLS

More information

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Breaking the Curse of Cardinality on Bitmap Indexes Permalink https://escholarship.org/uc/item/5v921692 Author Wu, Kesheng

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

Hardware Acceleration for Database Systems using Content Addressable Memories

Hardware Acceleration for Database Systems using Content Addressable Memories Hardware Acceleration for Database Systems using Content Addressable Memories Nagender Bandi, Sam Schneider, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara Overview The Memory

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

UpBit: Scalable In-Memory Updatable Bitmap Indexing

UpBit: Scalable In-Memory Updatable Bitmap Indexing : Scalable In-Memory Updatable Bitmap Indexing Manos Athanassoulis Harvard University manos@seas.harvard.edu Zheng Yan University of Maryland zhengyan@cs.umd.edu Stratos Idreos Harvard University stratos@seas.harvard.edu

More information

A Analysis and Optimization for Boolean Expression Indexing

A Analysis and Optimization for Boolean Expression Indexing A Analysis and Optimization for Boolean Expression Indexing Mohammad Sadoghi, University of Toronto Hans-Arno Jacobsen, University of Toronto -Tree is a novel dynamic tree data structure designed to efficiently

More information

University of Waterloo Midterm Examination Sample Solution

University of Waterloo Midterm Examination Sample Solution 1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,

More information

High-Throughput Publish/Subscribe in the Forwarding Plane

High-Throughput Publish/Subscribe in the Forwarding Plane 1 High-Throughput Publish/Subscribe in the Forwarding Plane Theo Jepsen, Masoud Moushref, Antonio Carzaniga, Nate Foster, Xiaozhou Li, Milad Sharif, Robert Soulé Università della Svizzera italiana (USI)

More information

Towards Declarative and Efficient Querying on Protein Structures

Towards Declarative and Efficient Querying on Protein Structures Towards Declarative and Efficient Querying on Protein Structures Jignesh M. Patel University of Michigan Biology Data Types Sequences: AGCGGTA. Structure: Interaction Maps: Micro-arrays: Gene A Gene B

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Query Evaluation Overview, cont.

Query Evaluation Overview, cont. Query Evaluation Overview, cont. Lecture 9 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record Manager

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Tackling the Challenges of Big Data! Tackling The Challenges of Big Data. This Module. Samuel Madden. Samuel Madden. Visualizing Twitter

Tackling the Challenges of Big Data! Tackling The Challenges of Big Data. This Module. Samuel Madden. Samuel Madden. Visualizing Twitter Samuel Madden Professor and Director of Big Data at CSAIL Massachusetts Institute of Technology Introduction to Twitter Data Samuel Madden Professor and Director of Big Data at CSAIL Massachusetts Institute

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems Anastassia Ailamaki Ph.D. Examination November 30, 2000 A DBMS on a 1980 Computer DBMS Execution PROCESSOR 10 cycles/instruction DBMS Data and Instructions 6 cycles

More information

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23 DATA WAREHOUSING II CS121: Relational Databases Fall 2017 Lecture 23 Last Time: Data Warehousing 2 Last time introduced the topic of decision support systems (DSS) and data warehousing Very large DBs used

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

Efficiently Evaluating Complex Boolean Expressions

Efficiently Evaluating Complex Boolean Expressions Efficiently Evaluating Complex Boolean Expressions Yahoo! Research Marcus Fontoura, Suhas Sadanadan, Jayavel Shanmugasundaram, Sergei Vassilvitski, Erik Vee, Srihari Venkatesan and Jason Zien Agenda Motivation

More information

Efficient Computation of Radial Distribution Function on GPUs

Efficient Computation of Radial Distribution Function on GPUs Efficient Computation of Radial Distribution Function on GPUs Yi-Cheng Tu * and Anand Kumar Department of Computer Science and Engineering University of South Florida, Tampa, Florida 2 Overview Introduction

More information

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Data Mining. Vera Goebel. Department of Informatics, University of Oslo Data Mining Vera Goebel Department of Informatics, University of Oslo 2012 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD

More information

Designing Hybrid Data Processing Systems for Heterogeneous Servers

Designing Hybrid Data Processing Systems for Heterogeneous Servers Designing Hybrid Data Processing Systems for Heterogeneous Servers Peter Pietzuch Large-Scale Distributed Systems (LSDS) Group Imperial College London http://lsds.doc.ic.ac.uk University

More information

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, Paul Chow University of Toronto 1 Cloudy with

More information

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research 1 The world s most valuable resource Data is everywhere! May. 2017 Values from Data! Need infrastructures for

More information

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE Oracle Database Exadata Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE Oracle Database Exadata combines the best database with the best cloud platform. Exadata is the culmination of more

More information

Write On Aws. Aws Tools For Windows Powershell User Guide using the aws tools for windows powershell (p. 19) this section includes information about

Write On Aws. Aws Tools For Windows Powershell User Guide using the aws tools for windows powershell (p. 19) this section includes information about We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with write on aws. To get

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Shark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( )

Shark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( ) Shark: SQL and Rich Analytics at Scale Yash Thakkar (2642764) Deeksha Singh (2641679) RDDs as foundation for relational processing in Shark: Resilient Distributed Datasets (RDDs): RDDs can be written at

More information

NUMA-aware Graph-structured Analytics

NUMA-aware Graph-structured Analytics NUMA-aware Graph-structured Analytics Kaiyuan Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, China Big Data Everywhere 00 Million Tweets/day 1.11

More information

Query Evaluation Overview, cont.

Query Evaluation Overview, cont. Query Evaluation Overview, cont. Lecture 9 Feb. 29, 2016 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record

More information

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer

More information

IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management

IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management Applications Yongzhi Wang, Jinpeng Wei Florida International University Mudhakar Srivatsa IBM T.J. Watson Research Center

More information

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of

More information

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics Min LI,, Jian Tan, Yandong Wang, Li Zhang, Valentina Salapura, Alan Bivens IBM TJ Watson Research Center * A

More information

Center Extreme Scale CS Research

Center Extreme Scale CS Research Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs

An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs Xin Huo, Vignesh T. Ravi, Wenjing Ma and Gagan Agrawal Department of Computer Science and Engineering

More information

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Jialiang Zhang, Soroosh Khoram and Jing Li 1 Outline Background Big graph analytics Hybrid

More information

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk

More information

Introduction to Distributed Data Systems

Introduction to Distributed Data Systems Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January

More information

Caribou: Intelligent Distributed Storage

Caribou: Intelligent Distributed Storage : Intelligent Distributed Storage Zsolt István, David Sidler, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich 1 Rack-scale thinking In the Cloud ToR Switch Compute Compute + Provisioning

More information

IBM s Data Warehouse Appliance Offerings

IBM s Data Warehouse Appliance Offerings IBM s Data Warehouse Appliance Offerings RChaitanya IBM India Software Labs Agenda 1 IBM Smart Analytics System (D5600) System Overview Technical Architecture Software / Hardware stack details 2 Netezza

More information

Rack-scale Data Processing System

Rack-scale Data Processing System Rack-scale Data Processing System Jana Giceva, Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich Rack-scale Data Processing

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2015 Quiz I There are 12 questions and 13 pages in this quiz booklet. To receive

More information

ECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1

ECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1 ECT7110 Data Preprocessing Prof. Wai Lam ECT7110 Data Preprocessing 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest,

More information

Adaptive Query Processing on Prefix Trees Wolfgang Lehner

Adaptive Query Processing on Prefix Trees Wolfgang Lehner Adaptive Query Processing on Prefix Trees Wolfgang Lehner Fachgruppentreffen, 22.11.2012 TU München Prof. Dr.-Ing. Wolfgang Lehner > Challenges for Database Systems Three things are important in the database

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

S-Store: Streaming Meets Transaction Processing

S-Store: Streaming Meets Transaction Processing S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability Oracle Enterprise Manager Top-Down, Integrated Application Management Complete, Open,

More information