CSE 303: Database RDBMS Architecture Lecture 11 How does a SQL engine work? Chapter 2: The Relational Algebra SQL Query Declarative query (from user) Relational Algebra (RA) Plan Translate to relational algebra expression Optimized RA Plan Find logically equivalent- but more efficient- RA expression Execution Execute each operator of the optimized plan! RDBMS Architecture How does a SQL engine work? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution Relational Algebra (RA) Five basic operators: 1. Selection: s 2. Projection: P 3. Cartesian Product: 4. Union: 5. Difference: - We ll look at these first! Then we ll look see these set operations Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions! Derived or auxiliary operators: Intersection Joins (natural,equi-join, theta join) Renaming: And also at one example of a derived operator (natural join) and a special operator (renaming) 1
Keep in mind: RA operates on sets! RDBMSs use Bags (multisets), however in relational algebra formalism we will consider sets! Also: we will consider the named perspective, where every attribute must have a unique name attribute order does not matter Now on to the basic RA operators Returns all tuples which satisfy a condition Notation: s c (R) Examples s Salary > 40000 (Employee) s name = Smith (Employee) The condition c can use =, <,, >,, <> SELECT * FROM Students WHERE gpa > 3.5; Another example: SSN Name Salary 1234545 John 200000 5423341 Smith 600000 4352342 Fred 500000 s Salary > 40000 (Employee) SSN Name Salary 5423341 Smith 600000 4352342 Fred 500000 Eliminates columns, then removes duplicates Notation: P A1,,An (R) Example: project social-security number and names: P SSN, Name (Employee) Output schema: Answer(SSN, Name) SELECT DISTINCT sname, gpa FROM Students; 2
Another example: SSN Name Salary 1234545 John 200000 5423341 John 600000 4352342 John 200000 P Name,Salary (Employee) Name Salary John 200000 John 600000 Note that RA Operators are Compositional! SELECT DISTINCT sname, gpa FROM Students WHERE gpa > 3.5; How do we represent this query in RA? Are these logically equivalent? Each tuple in R1 with each tuple in R2 Notation: R1 R2 Example: Employee Dependents Rare in practice; mainly used to express joins People(ssn,pname,address) SELECT * FROM Students, People; Another example: People Students ssn pname address 1234545 John 216 Rosse 5423341 Bob 217 Rosse sid sname gpa 001 John 3.4 002 Bob 1.3 ssn pname address sid sname gpa 1234545 John 216 Rosse 001 John 3.4 5423341 Bob 217 Rosse 001 John 3.4 1234545 John 216 Rosse 002 Bob 1.3 5423341 Bob 216 Rosse 002 Bob 1.3 3
Changes the schema, not the instance A special operator- neither basic nor derived Notation: S(B1,,Bn) (R) Note: this is shorthand for the proper form (since names, not order matters!): S(A1 B1,,An Bn) (R) SELECT sid AS studid, sname AS name, gpa AS gradeptavg FROM Students; ( studid, name, gradeptavg )( Students) Students Another example: Students Students ( studid, name, gradeptavg ) ( Students) sid sname gpa 001 John 3.4 002 Bob 1.3 Students studid name gradeptavg 001 John 3.4 002 Bob 1.3 Students(sid,name,gpa) People(ssn,name,address) SELECT DISTINCT ssid, S.name, gpa, ssn, address FROM Students S, People P WHERE S.name = P.name; Another example: Students S People P sid S.name gpa ssn P.name address 001 John 3.4 1234545 John 216 Rosse 002 Bob 1.3 5423341 Bob 217 Rosse sid S.name gpa ssn address 001 John 3.4 1234545 216 Rosse 002 Bob 1.3 5423341 216 Rosse 4
Natural Join Example: Converting SFW Query -> RA People(ssn,sname,address) SELECT DISTINCT gpa, address FROM Students S, People P WHERE gpa > 3.5 AND S.sname = P.sname; How do we represent this query in RA? CSE Lecture 30316 Database > Section 2 Advanced Relational Algebra 1. Set Operations in RA 2. Fancier RA 3. Extensions & Limitations Relational Algebra (RA) Five basic operators: 1. Selection: s 2. Projection: P 3. Cartesian Product: 4. Union: 5. Difference: - We ll look at these Derived or auxiliary operators: Intersection Joins (natural,equi-join, theta join, semi-join) Renaming: And also at some of these derived operators 19 5
1. Union ( ) and 2. Difference ( ) What about Intersection ( )? R1 R2 Example: ActiveEmployees RetiredEmployees R1 R2 Example: AllEmployees - RetiredEmployees R 1 R 2 R 1 R 2 It is a derived operator R1 R2 =? R1 R2 = R1 (R1 R2) Example UnionizedEmployees RetiredEmployees R 1 R 2 R 1 R 2 R1 R2 People(ssn,pname,address) Fancier RA Note that natural join is a theta join + a projection. SELECT * FROM Students,People WHERE q; 6
People(ssn,pname,address) Most common join in practice! SELECT * FROM Students S, People P WHERE sname = pname; Optimization Logical Equivalece of RA Plans RDBMS Architecture How does a SQL engine work? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We ll get a flavor of how to optimize on these plans now 7
Note: We can visualize the plan as a tree A simple plan What SQL query does this correspond to? Are there any logically equivalent RA expressions? Bottom-up tree traversal = order of operation execution! Pushing down projection Takeaways This process is called logical optimization Many equivalent plans used to search for good plans Relational algebra is an important abstraction. Why might we prefer this plan? 8
RA commutators The basic commutators: Push projection through (1) selection, (2) join Push selection through (3) selection, (4) projection, (5) join Also: Joins can be re-ordered! Optimizing the SFW RA Plan This simple set of tools allows us to greatly improve the execution time of queries by optimizing RA plans! Translating to RA T(C,D) SELECT R.A, T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; s A<10 T(C,D) Logical Optimization Heuristically, we want selections and projections to occur as early as possible in the plan Terminology: push down selections and pushing down projections. Intuition: We will have fewer tuples in a plan. 9
Optimizing RA Plan Push down T(C,D) SELECT R.A,T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; selection on A so it occurs earlier s A<10 Optimizing RA Plan Push down T(C,D) SELECT R.A,T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; selection on A so it occurs earlier T(C,D) T(C,D) s A<10 Optimizing RA Plan Push down T(C,D) SELECT R.A,T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; projection so it occurs earlier T(C,D) Optimizing RA Plan T(C,D) SELECT R.A,T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; We eliminate B earlier! In general, when is an attribute not needed? T(C,D) s A<10 s A<10 10
Example (PPLP) Example (PPLP) What PC models have a speed of at least 3.00? Find the model numbers of all color laser printers? Answer(model) := P model (s speed >= 3.00 (PC)) P model (s color= T AND type = laser (Printer)) Which manufacturers make laptops with a hard disk of at least 100GB. Find those manufacturer that sells Laptops but not PC s. Answer(maker) := P maker (s hd >= 100 (Product Laptop)) P maker (s type= laptop (Product)) - P maker (s type= pc (Product)) Example (PPLP) Find those hard disk sizes that occur two or more PC s? Example (PPLP) Find the manufacturer(s) of the PC or Laptop with highest available speed? P hd (s pc1.model < > pc2.model AND pc1.hd = pc2.hd (ρ pc1 (PC) ρ pc2 (PC))) R1:= P model,speed (s speed >= 2.0 (PC) s speed >= 2.0 (Laptop)) Find those pairs of PC models that have both the same speed and RAM. A pair should be listed only once. P maker, model (Product) - P pdpl1.maker, pdpl1.model (s pdpl1.speed > pdpl2.speed (ρ pdpl1 (Product R1 ) ρ pdpl2 (Product R1))) P pc1.model,pc2.model (s pc1.model < pc2.model AND pc1.speed = pc2.speed AND pc1.ram = pc2.ram (ρ pc1 (PC) ρ pc2 (PC))) 11