What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server

Size: px

Start display at page:

Download "What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server"

Gyles Cross
5 years ago
Views:

1 What We Have Already Learned CSE 444: Database Internals Lectures Parallel DBMSs Overall architecture of a DBMS Internals of query execution: Data storage and indexing Buffer management Query evaluation including operator algorithms Query optimization Internals of transaction processing: Concurrency control: pessimistic and optimistic Transaction recovery: undo, redo, and undo/redo 1 2 Where We Are Headed Next DBMS Deployment: Local Scaling the execution of a query (this week) Parallel DBMS MapReduce Distributed query processing and optimization Scaling transactions (next week) Distributed transactions Replication Application DBMS Great for one application (could be more) and one user. Desktop Scaling with NoSQL and NewSQL (in two weeks) Data files on disk 3 4 DBMS Deployment: Client/Server DBMS Deployment: 3 Tiers Great for many apps and many users Great for web-based applications connection (ODBC, JDBC) Connection (e.g., JDBC) HTTP/SSL Data files Server Applications 5 Data files DB Server Web Server & Magda Balazinska - CSE 444, App Spring Server Browser 1

DBMS Deployment: Cloud Great for web-based applications How to

Data files Scale up HTTP/SSL DB Server Developers Web & App

is to scale OLAP workloads That means the analysis of massive

next week 9 10 Science is Facing a Data Deluge!

LSST) Medicine: ubiquitous digital records, MRI, ultrasound

2 DBMS Deployment: Cloud Great for web-based applications How to Scale a DBMS? Data files Scale up HTTP/SSL DB Server Developers Web & App Server 7 Users Why Do I Care About Scaling Transactions Per Second? A more powerful server Scale out More servers 8 Why Do I Care About Scaling A Single Query? Amazon Facebook Twitter your favorite Internet application Goal is to scale OLAP workloads That means the analysis of massive datasets Goal is to scale OLTP workloads We will get back to this next week 9 10 Science is Facing a Data Deluge! Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST) Medicine: ubiquitous digital records, MRI, ultrasound Biology: lab automation, high-throughput sequencing Oceanography: high-resolution models, cheap sensors, satellites Data holds the promise to Etc. This Week: Focus on Scaling a Single Query accelerate discovery But analyzing all this data is a challenge

Industry is Facing a Data Deluge! Clickstreams, search logs, network logs, social networking data, RFID data, etc. Examples: Facebook, Twitter, Google, Microsoft, Amazon, Walmart, etc.

services But analyzing all this data is a challenge 13 Relational algebra and SQL are easy to parallelize and parallel DBMSs have already been studied in the 80's!

(this lecture) Vertica founded in 2005 and acquired by HP in 2011; A parallel, column-store shared-nothing DBMS (see 444 for discussion of column-stores) DATAllegro founded in 2003 acquired by

3 Industry is Facing a Data Deluge! Clickstreams, search logs, network logs, social networking data, RFID data, etc. Examples: Facebook, Twitter, Google, Microsoft, Amazon, Walmart, etc. Big Data Companies, organizations, scientists have data that is too big, too fast, and too complex to be managed without changing tools and processes Data holds the promise to deliver new and better services But analyzing all this data is a challenge 13 Relational algebra and SQL are easy to parallelize and parallel DBMSs have already been studied in the 80's! 14 Data Analytics Companies As a result, we are seeing an explosion of and a huge success of db analytics companies Greenplum founded in 2003 acquired by EMC in 2010; A parallel shared-nothing DBMS (this lecture) Vertica founded in 2005 and acquired by HP in 2011; A parallel, column-store shared-nothing DBMS (see 444 for discussion of column-stores) DATAllegro founded in 2003 acquired by Microsoft in 2008; A parallel, shared-nothing DBMS Aster Data Systems founded in 2005 acquired by Teradata in 2011; A parallel, shared-nothing, MapReduce-based data processing system (next lecture). SQL on top of MapReduce Netezza founded in 2000 and acquired by IBM in A parallel, shared-nothing DBMS. Great time to be in the data management, data mining/statistics, or machine learning! Two Approaches to Parallel Data Processing Parallel databases, developed starting with the 80s (this lecture) For both OLTP (transaction processing) And for OLAP (Decision Support Queries) MapReduce, first developed by Google, published in 2004 (next lecture) Only for Decision Support Queries Today we see convergence of the two approaches (Greenplum,Tenzing SQL) References Parallel v.s. Distributed Databases Book Chapter 20.1 Database management systems. Ramakrishnan and Gehrke. Third Ed. Chapter (more info than our main book) Distributed database system (early next week): Data is stored across several sites, each site managed by a DBMS capable of running independently Parallel database system (today): Improve performance through parallel implementation

4 Parallel DBMSs Goal Improve performance by executing multiple operations in parallel Key benefit Cheaper to scale than relying on a single increasingly more powerful processor Performance Metrics for Parallel DBMSs Speedup More processors è higher speed Individual queries should run faster Should do more transactions per second (TPS) Fixed problem size overall, vary # of processors ("strong scaling ) Key challenge Ensure overhead and contention do not kill performance Linear v.s. Non-linear Speedup Performance Metrics for Parallel DBMSs Speedup # processors (=P) Scaleup More processors è can process more data Fixed problem size per processor, vary # of processors ("weak scaling ) Batch scaleup Same query on larger input data should take the same time Transaction scaleup N-times as many TPS on N-times larger database But each transaction typically remains small Linear v.s. Non-linear Scaleup Warning Batch Scaleup Be careful. Commonly used terms today: scale up = use an increasingly more powerful server scale out = use a larger number of servers # processors (=P) AND data size

Challenges to Linear Speedup and Scaleup Startup cost Cost of starting an operation on many processors Architectures for Parallel Databases Interference Contention for resources between processors

5 Challenges to Linear Speedup and Scaleup Startup cost Cost of starting an operation on many processors Architectures for Parallel Databases Interference Contention for resources between processors Skew Slowest processor becomes the bottleneck From: Greenplum Database Whitepaper SAN = Storage Area Network Shared Memory Shared Disk Nodes share both RAM and disk Dozens to hundreds of processors All nodes access the same disks Found in the largest "single-box" (non-cluster) multiprocessors Example: SQL Server runs on a single machine and can leverage many threads to get a query to run faster (see query plans) Easy to use and program But very expensive to scale 27 Oracle dominates this class of systems Characteristics: Also hard to scale past a certain point: existing deployments typically have fewer than 10 machines 28 Shared Nothing Cluster of machines on high-speed network Called "clusters" or "blade servers Each machine has its own memory and disk: lowest contention. NOTE: Because all machines today have many cores and many disks, then shared-nothing systems typically run many "nodes on a single physical machine. In Class You have a parallel machine. Now what? How do you speed up your DBMS? Characteristics: Today, this is the most scalable architecture. Most difficult to administer and tune. We discuss only Shared Nothing in class

Approaches to Parallel Query Evaluation Inter-query parallelism Each query runs on one processor Only for OLTP queries Inter-operator parallelism A query runs on multiple processors An operator runs

6 Approaches to Parallel Query Evaluation Inter-query parallelism Each query runs on one processor Only for OLTP queries Inter-operator parallelism A query runs on multiple processors An operator runs on one processor For both OLTP and Decision Support Intra-operator parallelism An operator runs on multiple processors For both OLTP and Decision Support Customer Product Product Customer 31 We study only intra-operator parallelism: most scalable Product Product Product Customer Customer Customer Horizontal Data Partitioning Relation R split into P chunks R 0,, R P-1, stored at the P nodes Block partitioned Each group of k tuples go to a different node Hash based partitioning on attribute A: Tuple t to chunk h(t.a) mod P Range based partitioning on attribute A: Tuple t to chunk i if v i-1 < t.a < v i 32 Uniform Data v.s. Skewed Data Example from Teradata Let R(K,A,B,C); which of the following partition methods may result in skewed partitions? Block partition Uniform Hash-partition On the key K On the attribute A Range-partition On the key K On the attribute A Uniform May be skewed May be skewed Assuming uniform function E.g. when all records have the same value of the attribute A, then all records end up in the same partition Difficult to partition the range of A uniformly. AMP = unit of parallelism Horizontal Data Partitioning All three choices are just special cases: For each tuple, compute bin = f(t) Different properties of the function f determine vs. range vs. round robin vs. anything Parallel Selection Compute σ A=v (R), or σ v1<a<v2 (R) On a conventional database: cost = B(R) Q: What is the cost on a parallel database with P processors? Block partitioned Hash partitioned Range partitioned

7 Parallel Selection Q: What is the cost on a parallel database with P nodes? A: B(R) / P in all cases if cost is response time However, different processors do the work: Block: all servers do the work Hash: one server for σ A=v (R), all for σ v1<a<v2 (R) Range: some servers only Data Partitioning Revisited What are the pros and cons? Block based partitioning Good load balance but always needs to read all the data Hash based partitioning Good load balance Can avoid reading all the data for equality ions Range based partitioning Can suffer from skew (i.e., load imbalances) Can help reduce skew by creating uneven partitions Parallel Group By: γ A, sum(b) (R) Parallel GroupBy Step 1: server i partitions chunk R i using a function h(t.a) mod P: R i0, R i1,, R i,p-1 Step 2: server i sends partition R ij to serve j γ A,sum(C) (R) If R is partitioned on A, then each node computes the group-by locally Otherwise, -partition R(K,A,B,C) on A, then compute group-by locally: Step 3: server j computes γ A, sum(b) on R 0j, R 1j,, R P-1,j Reshuffle R on attribute A R 1 R 2... R P 39 R 1 R 2... R P 40 Parallel Group By: γ A, sum(b) (R) Parallel Group By: γ A, sum(b) (R) Can we do better? Sum? Count? Avg? Max? Median? Sum(B) = Sum(B 0 ) + Sum(B 1 ) + + Sum(B n ) Count(B) = Count(B 0 ) + Count(B 1 ) + + Count(B n ) Max(B) = Max(Max(B 0 ), Max(B 1 ),, Max(B n )) distributive Avg(B) = Sum(B) / Count(B) Median(B) = algebraic holistic

a) mod P: S j0, S j1,, R j,p-1 Overall Architecture SQL Query Step 2: Server i sends partition R iu to server u Server j sends partition S ju to server u Steps 3: Server u computes the join of R iu

8 Parallel Join: R A=B S Step 1 For all servers in [0,k], server i partitions chunk R i using a function h(t.a) mod P: R i0, R i1,, R i,p-1 For all servers in [k+1,p], server j partitions chunk S j using a function h(t.a) mod P: S j0, S j1,, R j,p-1 Overall Architecture SQL Query Step 2: Server i sends partition R iu to server u Server j sends partition S ju to server u Steps 3: Server u computes the join of R iu with S ju 43 From: Greenplum Database Whitepaper 44 Example of Parallel Query Plan Example Parallel Plan join o.item = i.item date = today() Find all orders from today, along with the items ordered SELECT * FROM Orders o, Lines i WHERE o.item = i.item AND o.date = today() join Item i o.item = i.item date = today() h(o.item) date=today() h(o.item) date=today() h(o.item) date=today() Example Parallel Plan join o.item = i.item date = today() Item i Example Parallel Plan join join join o.item = i.item o.item = i.item o.item = i.item h(i.item) Item i h(i.item) Item i h(i.item) Item i contains all orders and all lines where (item) = 2 contains all orders and all lines where (item) = 1 contains all orders and all lines where (item) =

9 Optimization for Small Relations When joining R and S If R >> S Leave R where it is Replicate entire S relation across nodes Sometimes called a small join Other Interesting Parallel Join Implementation Problem of skew during join computation Some join partitions get more input tuples than others Reason 1: Base data unevenly distributed across machines Because used a range-partition function Or used ing but some values are very popular Reason 2: Selection before join with different ivities Reason 3: Input data got unevenly reed (or otherwise repartitioned before the join) Some partitions output more tuples than others Some Skew Handling Techniques 1. Use range- instead of -partitions Ensure that each range gets same number of tuples Example: {1, 1, 1, 2, 3, 4, 5, 6 } à [1,2] and [3,6] 2. Create more partitions than nodes And be smart about scheduling the partitions 3. Use subset-replicate (i.e., skewedjoin ) Given an extremely common value v Distribute R tuples with value v randomly across k nodes (R is the build relation) Replicate S tuples with value v to same k machines (S is the probe relation) 51 Parallel Dataflow Implementation Use relational operators unchanged Add a special shuffle operator Handle data routing, buffering, and flow control Inserted between consecutive operators in the query plan Two components: ShuffleProducer and ShuffleConsumer Producer pulls data from operator and sends to n consumers Producer acts as driver for operators below it in query plan Consumer buffers input data from n producers and makes it available to operator through getnext interface 52 Modern Shared Nothing Parallel DBMSs Greenplum founded in 2003 acquired by EMC in 2010 Vertica founded in 2005 and acquired by HP in 2011 DATAllegro founded in 2003 acquired by Microsoft in 2008 Netezza founded in 2000 and acquired by IBM in 2010 Aster Data Systems founded in 2005 acquired by Teradata in 2011 MapReduce-based data processing system (next week) 53 9

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Lectures 23 and 24 Parallel Databases 1 Why compute in parallel? Most processors have multiple cores Can run multiple jobs simultaneously Natural extension of txn