View-Baed Tree-Language Rewriting Lak Lakhmanan, Alex Thomo Univerity of Britih Columbia, Canada Univerity of Vitoria, Canada
Importane of tree XML Semi-trutured textual format are very popular. <movie> <title>houe of ard</title> <year>2013</year> <harater> <name>frani</name> <ator>kevin Spaey</ator> </harater> <harater> <name>claire</name> <ator>robin Wright</ator> </harater> </movie> XML (Multi TB) ue torie: 1. Elevier Paper and book 2. JPMorgan Chae & Co Stok reearh data 3. JetBlue Airway Doument management Soure: MarkLogi XML Impating the Enterprie Tapping into the Power of XML: Five Sue Storie
Importane of tree JSON Semi-trutured textual format are very popular. "movie": { "title": "Houe of ard", "year": "2013", "harater": [ { "name": "Frani", "ator": "Kevin Spaey" }, { "name": "Claire", "ator": "Robin Wright" } ] } JSON (Multi TB) ue torie: 1. CouhDB 2. MongoDB 3. Jaql and Hive JSON SerDe for Hadoop Mantra: Log firt, ak quetion later
Tree viually movie title year harater harater Houe of Card 2013 name ator name ator Frani Kevin Spaey Claire Robin Wright
Another example
Importane of view (example) Big databae of movie in a uper-tree, eah movie being a ub-tree Query ak for all the movie ub-tree with a MAC. mall minority; number about 50. Reult materialized into a view. Tremendou help in anwering new querie, e.g. find ator playing a MAC. Rewrite into: find ator playing a MAC in a movie having a MAC anwer it on the materialized view.
Regular Expreion and Automata Return all movie ator _* m _* aˆ A pattern Automaton 0 0 m 0 1 1 1 1 2 ˆ 2 a 3
Revere Return all movie ator _* m _* aˆ A pattern Automaton a ˆ a a m m m m
Bottom-up Tree Automata Return all movie ator _* m _* aˆ Automaton * * * * a * * a ˆ a * m * m m m A pattern
Run
Bottom-up Tree Automata (II) Return all movie ator of MAC _* m _* MAC aˆ Automaton aˆ a aa * * ˆ * * * * aˆ * aˆ m * m m m a a A pattern
Bottom-up Tree Automata (IV) Return all movie having ome MAC _* mˆ _* MAC a Automaton * a * * * a * * a a * ˆ m * m m m A pattern
Run
Bottom-up Tree Automata (V) Regular tree language (RTA) the et of tree reognized by TA. loed under interetion and omplement Determiniti TA For any tree t, there an be at mot one aepting run of A on t. Power-wie, TA = DTA. Complement obtained from determiniti TA Interetion via a peial ontrution preerve determinim.
Querie Querie are regular et of tree over ˆ Containment Lemma Q Q Q 1 Q2 implie an 1 an 2
Star Operation
Filled Star Operation
Tranformation for avoiding marker overlap
Rewriting, and two et Maximally ontained rewriting: The bad et: The promiing et: Propoition.
Example with hain
Example with hain (II)
Invere of the tar operation Propoition. Compute where J and J are RTQ
Colored Alphabet Marker will be olor Blue for J Red for J
Colored Language et of all tree having one node blue et of all tree having one node red et of all tree having one node blue and another red a deendant of the blue node et of all tree having all node blak, exept root whih i red et of all tree having all node blak, exept for the root whih i blue and another node whih i red.
Colored Language (II)
Colored Language (III) over ame a p, but with blue node turned blak ame a p, but with red node turned blak over
Colored Language (IV) automaton for Similarly:
Rewriting Algorithm Compute: Theorem.
Rewriting Algorithm (II)
Complexity Propoition. polynomial time. an be omputed in Theorem. The MCR of Q uing V an be omputed in exponential time. Theorem. Computing the MCR of Q uing V i EXPTIME-hard.
Final Note Query automata formalim ued i equivalent in power to MSO (golden tandard) For peifying node-eleting querie. Color orrepond to Boolean marking J. Niehren, L. Planque, J.-M. Talbot, and S. Tion. N-ary querie by tree automata. DBPL, 2005 XPath rewriting i NP-hard. XPath i a ubla of our formalim. Our automata-baed algorithm an be ued a well for rewriting XPath querie.
K-ary querie Example: Find the 2-foret of ator tree pair for ator who have played the ame harater together in ome movie. Automaton * aˆ aˆ * * * * * * aˆ m * aˆ m m m A pattern
Run
Why i rewriting K-ary querie hallenging It ha been hown that k-ary querie an be enoded by unary querie T. Shwentik. On diving in tree. In MFCS, 2000. Done by going through MSO formula. Going from a k-ary query to an MSO enoding and then bak to automata inur non elementary omplexity. Therefore we need a another algorithm for rewriting k-ary querie that doen t go via MSO formula
Conluion Charaterized view-baed rewriting a olving a lang. equation Defined appropriate tree operator Defined olored language Gave automata ontrution Computed rewriting a a erie of operation on automata Charaterized the omplexity of omputing rewriting Tight lower bound provided Extended the reult to k-ary querie Common in XQuery
Thank You