Midterm. Introduction to Data Management CSE 344. Datalog. What is Datalog? Why Do We Learn Datalog? Why Do We Learn Datalog? Lecture 13: Datalog

Similar documents
Midterm. Database Systems CSE 414. What is Datalog? Datalog. Why Do We Learn Datalog? Why Do We Learn Datalog?

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Introduction to Data Management CSE 344. Lecture 14: Datalog (guest lecturer Dan Suciu)

Announcements. Database Systems CSE 414. Datalog. What is Datalog? Why Do We Learn Datalog? HW2 is due today 11pm. WQ2 is due tomorrow 11pm

Database Systems CSE 414. Lecture 9-10: Datalog (Ch )

Introduction to Data Management CSE 344. Lecture 14: Datalog

Introduction to Data Management CSE 344

CSE 344 JANUARY 29 TH DATALOG

Introduction to Data Management CSE 344. Lecture 14: Relational Calculus

CSE 344 APRIL 11 TH DATALOG

Principles of Database Systems CSE 544. Lecture #2 SQL, Relational Algebra, Relational Calculus

Introduction to Data Management CSE 344. Lecture 12: Relational Calculus

CSE 344 JANUARY 26 TH DATALOG

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Database Management Systems CSEP 544. Lecture 3: SQL Relational Algebra, and Datalog

Three Query Language Formalisms

Three Query Language Formalisms

I Relational Database Modeling how to define

Lecture 1: Conjunctive Queries

Recap and Schedule. Till Now: Today: Ø Query Languages: Relational Algebra; SQL. Ø Datalog A logical query language. Ø SQL Recursion.

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

I Relational Database Modeling how to define

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Database Theory: Beyond FO

2.2.2.Relational Database concept

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

CSE344 Midterm Exam Winter 2017

Data Integration: Logic Query Languages

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

CSE 344 MAY 7 TH EXAM REVIEW

Relational Model, Relational Algebra, and SQL

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra. Used in SQL recursion.

Logical Query Languages. Motivation: 1. Logical rules extend more naturally to. recursive queries than does relational algebra.

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CSE344 Midterm Exam Fall 2016

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Database Systems CSE 414

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Other Query Languages II. Winter Lecture 12

CMPS 277 Principles of Database Systems. Lecture #11

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

Informationslogistik Unit 4: The Relational Algebra

CSE 344 Midterm Exam

Datalog. Susan B. Davidson. CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Relational Algebra 1

Datalog Recursive SQL LogicBlox

Datalog. Rules Programs Negation

Chapter 6: Bottom-Up Evaluation

Range Restriction for General Formulas

CSE 444 Homework 1 Relational Algebra, Heap Files, and Buffer Manager. Name: Question Points Score Total: 50

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

Operations of Relational Algebra

Chapter 2: Intro to Relational Model

CompSci 516: Database Systems

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

Prolog Programming. Lecture Module 8

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Database design and implementation CMPSCI 645. Lecture 04: SQL and Datalog

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Midterm Review. Winter Lecture 13

Relational Model and Relational Algebra

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

CS317 File and Database Systems

Relational Databases

Relational Calculus. Lecture 4B Kathleen Durant Northeastern University

CPSC 304 Introduction to Database Systems

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

CMPS 277 Principles of Database Systems. Lecture #3

CMP-3440 Database Systems

Query formalisms for relational model relational calculus

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

CSE 344 Final Review. August 16 th

Foundations of Databases

Relational Query Languages: Relational Algebra. Juliana Freire

Chapter 5: Other Relational Languages

A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries

CSCI-6421 Final Exam York University Fall Term 2004

Introduction to Data Management. Lecture #11 (Relational Algebra)

A Retrospective on Datalog 1.0

Chapter 16. Logic Programming Languages ISBN

CSE 344 Midterm. Wednesday, Nov. 1st, 2017, 1:30-2:20. Question Points Score Total: 100

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Databases Relational algebra Lectures for mathematics students

Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

Relational Algebra for sets Introduction to relational algebra for bags

10.3 Recursive Programming in Datalog. While relational algebra can express many useful operations on relations, there

Other Relational Languages

Relational Algebra and Calculus

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Ian Kenny. November 28, 2017

CMPT 354: Database System I. Lecture 5. Relational Algebra

CS317 File and Database Systems

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

CS352 Lecture - Relational Calculus; QBE. Objectives: 1. To briefly introduce the tuple and domain relational calculi 2. To briefly introduce QBE.

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100

Transcription:

Midterm Introduction to Data Management CSE 344 Lecture 13: Datalog Guest lecturer: Laurel Orr Monday, February 8 th in class Content Lectures 1 through 13 Homework 1 through 4 (due Feb 10) Webquiz 1 through 4 (due Feb 6) Closed book. No computers, phones, watches, etc.! Can bring one letter-sized piece of paper with notes Can write on both sides CSE 344 - Winter 2016 1 CSE 344 - Winter 2016 2 What is Datalog? Another query language for relational model Simple and elegant Initially designed for recursive queries Today: Some companies use datalog for data analytics, e.g. LogicBlox Increased interest due to recursive analytics We discuss only recursion-free or nonrecursive datalog and add negation Datalog Book: 5.3, 5.4 Query Language primer on CSE344 web CSE 344 - Winter 2016 3 CSE 344 - Winter 2016 4 Why Do We Learn Datalog? A query language that is closest to mathematical logic Good language to reason about query properties Datalog can be translated to SQL (practice at home!) Helps to express complex SQL as we will see next lecture Can also translate back and forth between datalog and RA Increase in datalog interest due to recursive analytics Interesting: relational algebra, non-recursive datalog with negation, and relational calculus all have the same expressive power! Why Do We Learn Datalog? Datalog, Relational Algebra, and Relational Calculus are of fundamental importance in DBMSs because 1. Sufficiently expressive to be useful in practice yet 2. Sufficiently simple to be efficiently implementable CSE 344 - Winter 2016 5 CSE 344 - Winter 2016 6 1

SQL Query vs Datalog (which would you rather write?) DirectReports(eid, 0) :- Employee(eid), not Manages(_, eid) DirectReports(eid, level+1) :- DirectReports(mid, level), Manages(mid, eid) CSE 344 - Winter 2016 7 Datalog We do not run datalog in 344; to try out on you own: Download DLV (http://www.dbai.tuwien.ac.at/proj/dlv/) Run DLV on this file Can also try IRIS (http://www.iris-reasoner.org/demo) parent(william, john). parent(john, james). parent(james, bill). parent(sue, bill). parent(james, carol). parent(sue, carol). male(john). male(james). female(sue). male(bill). female(carol). grandparent(x, Y) :- parent(x, Z), parent(z, Y). father(x, Y) :- parent(x, Y), male(x). mother(x, Y) :- parent(x, Y), female(x). brother(x, Y) :- parent(p, X), parent(p, Y), male(x), X!= Y. sister(x, Y) :- parent(p, X), parent(p, Y), female(x), X!= Y. CSE 344 - Winter 2016 8 No need for x z Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y, 1940 ). Find Movies made in 1940 Find Actors who acted in Movies made in 1940 CSE 344 - Winter 2016 9 CSE 344 - Winter 2016 10 Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y, 1940 ). Q3(f,l) :- Actor(z,f,l), Casts(z,x1), Movie(x1,y1,1910), Casts(z,x2), Movie(x2,y2,1940) Find Actors who acted in a Movie in 1940 and in one in 1910 CSE 344 - Winter 2016 11 No need for x z Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y, 1940 ). Q3(f,l) :- Actor(z,f,l), Casts(z,x1), Movie(x1,y1,1910), Casts(z,x2), Movie(x2,y2,1940) Extensional Database Predicates = EDB = Actor, Casts, Movie Intensional Database Predicates = IDB = Q1, Q2, Q3 CSE 344 - Winter 2016 12 No need for x z 2

Datalog: Terminology head body atom atom atom (aka subgoal) Q2(f, l) :- Actor(z,f,l), Casts(z,x), Movie(x,y, 1940 ). More Datalog Terminology Q(args) :- R1(args), R2(args),... Book writes: Q(args) :- R1(args) AND R2(args) AND... R i (args i ) is called an atom, or a relational predicate R i (args i ) evaluates to true when relation R i contains the tuple described by args i. Example: Actor(344759, Douglas, Fowley ) is true f, l = head variables x,y,z = existential variables CSE 344 - Winter 2016 13 In addition to relational predicates, we can also have arithmetic predicates Example: z= 1940. CSE 344 - Winter 2016 14 Semantics Meaning of a datalog rule = a logical statement! Means: x. y. z. [(Movie(x,y,z) and z= 1940 ) Q1(y)] and Q1 is the smallest relation that has this property Note: logically equivalent to: y. [( x. z. Movie(x,y,z) and z= 1940 ) Q1(y)] That's why vars not in head are called "existential variables". CSE 344 - Winter 2016 15 Datalog program A datalog program is a collection of one or more rules Each rule expresses the idea that from certain combinations of tuples in certain relations, we may infer that some other tuple must be in some other relation or in the query answer Example: Find all actors with Bacon number 2 B0(x) :- Actor(x,'Kevin', 'Bacon') B1(x) :- Actor(x,f,l), Casts(x,z), Casts(y,z), B0(y) B2(x) :- Actor(x,f,l), Casts(x,z), Casts(y,z), B1(y) Q4(x) :- B0(x) Q4(x) :- B2(x) Note: Q4 means the union of B0 and B2 We actually don t need Q4(x) :- B0(x) 16 Recursive Datalog In datalog, rules can be recursive Path(x, y) :- Edge(x, y). Path(x, y) :- Path(x, z), Edge (z, y). We study only on non-recursive datalog 5 Datalog with negation Find all actors who do not have a Bacon number < 2 B0(x) :- Actor(x,'Kevin', 'Bacon') B1(x) :- Actor(x,f,l), Casts(x,z), Casts(y,z), B0(y) Q6(x) :- Actor(x,f,l), not B1(x), not B0(x) 2 1 3 4 Edge encodes a graph Path finds all paths CSE 344 - Winter 2016 17 CSE 344 - Winter 2016 18 3

Safe Datalog Rules Here are unsafe datalog rules. What s unsafe about them? U1(x,y) :- Movie(x,z,1994), y>1910 U2(x) :- Movie(x,z,1994), not Casts(u,x) A datalog rule is safe if every variable appears in some positive relational atom Simpler than in relational calculus Datalog v.s. Relational Algebra Every expression in the basic relational algebra can be expressed as a Datalog query But operations in the extended relational algebra (grouping, aggregation, and sorting) have no corresponding features in the version of datalog that we discussed today Similarly, datalog can express recursion, which relational algebra cannot CSE 344 - Winter 2016 20 Schema for our examples R(A,B,C) S(D,E,F) T(G,H) Union R(A,B,C) S(D,E,F) U(x,y,z) :- R(x,y,z) U(x,y,z) :- S(x,y,z) CSE 344 - Winter 2016 21 CSE 344 - Winter 2016 22 Intersection R(A,B,C) S(D,E,F) I(x,y,z) :- R(x,y,z), S(x,y,z) Selection: σ x>100 and y= some string (R) L(x,y,z) :- R(x,y,z), x > 100, y= some string Selection x>100 or y= some string L(x,y,z) :- R(x,y,z), x > 100 L(x,y,z) :- R(x,y,z), y= some string CSE 344 - Winter 2016 23 CSE 344 - Winter 2016 24 4

Equi-join: R R.A=S.D and R.B=S.E S J(x,y,z,q) :- R(x,y,z), S(x,y,q) Projection P(x) :- R(x,y,z) CSE 344 - Winter 2016 25 CSE 344 - Winter 2016 26 To express difference, we add negation D(x,y,z) :- R(x,y,z), NOT S(x,y,z) R(A,B,C) S(D,E,F) T(G,H) Examples Translate: Π Α (σ B=3 (R) ) A(a) :- R(a,3,_) Underscore used to denote an "anonymous variable, a variable that appears only once. CSE 344 - Winter 2016 27 CSE 344 - Winter 2016 28 R(A,B,C) S(D,E,F) T(G,H) Examples Translate: Π Α (σ B=3 (R) R.A=S.D σ E=5 (S) ) A(a) :- R(a,3,_), S(a,5,_) Friend(name1, name2) Find Joe's friends, and Joe's friends of friends. A(x) :- Friend('Joe', x) A(x) :- Friend('Joe', z), Friend(z, x) CSE 344 - Winter 2016 29 CSE 344 - Winter 2016 30 5

Friend(name1, name2) Find all of Joe's friends who do not have any friends except for Joe: JoeFriends(x) :- Friend('Joe',x) NonAns(x) :- JoeFriends(x), Friend(x,y), y!= Joe A(x) :- JoeFriends(x), NOT NonAns(x) CSE 344 - Winter 2016 31 Friend(name1, name2) Find all people such that all their enemies' enemies are their friends Q: if someone doesn't have any enemies nor friends, do we want them in the answer? A: Yes! Everyone(x) :- Friend(x,y) Everyone(x) :- Friend(y,x) Everyone(x) :- Enemy(x,y) Everyone(x) :- Enemy(y,x) NonAns(x) :- Enemy(x,y),Enemy(y,z), NOT Friend(x,z) A(x) :- Everyone(x), CSE NOT 344 - NonAns(x) Winter 2016 32 Friend(name1, name2) Find all persons x that have a friend all of whose enemies are x's enemies. Everyone(x) :- Friend(x,y) NonAns(x) :- Friend(x,y) Enemy(y,z), NOT Enemy(x,z) A(x) :- Everyone(x), NOT NonAns(x) CSE 344 - Winter 2016 33 Datalog Summary EDB (base relations) and IDB (derived relations) Datalog program = set of rules Datalog is recursive But we learn non-recursive datalog Pure datalog does not have negation; if we want negation we say datalog+negation Multiple atoms in a rule mean join (or intersection) Variables with the same name are join variables Multiple rules with same head mean union CSE 344 - Winter 2016 34 6