arxiv: v1 [cs.db] 30 Nov 2011

Size: px

Start display at page:

Download "arxiv: v1 [cs.db] 30 Nov 2011"

Aron Dalton
5 years ago
Views:

1 Size- Object Summaries for Reationa Keyword Search Georgios J. Fakas, Zhi Cai, Nikos Mamouis Department of Computing and Mathematics Department of Computer Science Manchester Metropoitan University, UK The University of Hong Kong, Hong Kong {g.fakas, arxiv:.v [cs.db] Nov ABSTRACT A previousy proposed keyword search paradigm produces, as a query resut, a ranked ist of Object Summaries (OSs). An OS is a tree structure of reated tupes that summarizes a data hed in a reationa database about a particuar Data Subject (DS). However, some of these OSs are very arge in size and therefore unfriendy to users that initiay prefer synoptic information before proceeding to more comprehensive information about a particuar DS. In this paper, we investigate the effective and efficient retrieva of concise and informative OSs. We argue that a good size- OS shoud be a stand-aone and meaningfu synopsis of the most important information about the particuar DS. More precisey, we define a size- OS as a partia OS composed of important tupes. We propose three agorithms for the efficient generation of size- OSs (in addition to the optima approach which requires exponentia time). Experimenta evauation on DBLP and TPC-H databases verifies the effectiveness and efficiency of our approach.. INTRODUCTION Web Keyword Search (W-KwS) has been very successfu because it aows users to extract effectivey and efficienty usefu information from the web using ony a set of keywords. For instance, Exampe iustrates the partia resut of a W-KwS (e.g. Googe) for Q: Faoutsos : a ranked set (with the first three resuts shown ony) of inks to web pages containing the keyword(s). We observe that each resut is accompanied with a snippet [], i.e. a short summary that sometimes even incudes the compete answer to the query (if, for exampe, the user is ony interested in whether Christos Faoutsos is a Professor or whether his brothers are academics). The success of the W-KwS paradigm has encouraged the emergence of the keyword search paradigm in reationa databases (R- KwS) [,, ]. The R-KwS paradigm is used to find tupes that contain the keywords and their reationships through foreign-key inks, e.g. query Q: Faoutsos + Agrawa returns Authors Fa- Partiay supported by the Hosting of Experienced Researchers from Abroad programme (ΠPOΣEΛKYΣH/ΠPOEM/0) funded by the Research Promotion Foundation, Cyprus. Permission to make digita or hard copies of a or part of this work for persona or cassroom use is granted without fee provided that copies are not made or distributed for profit or commercia advantage and that copies bear this notice and the fu citation on the first page. To copy otherwise, to repubish, to post on servers or to redistribute to ists, requires prior specific permission and/or a fee. Artices from this voume were invited to present their resuts at The th Internationa Conference on Very Large Data Bases, August th - st, Istanbu, Turkey. Proceedings of the VLDB Endowment, Vo., No. Copyright VLDB Endowment 0-//... $.00. EXAMPLE. Q Faoutsos using a W-KwS (Googe) Christos Faoutsos SCS CSD Professor s affiiatons, research, projects, pubications and teaching. christos/ - k Michais Faoutsos The Homepage of Michais Faoutsos... Interesting and Miscaaneous Links Fun pictures Other Faoutsos on the web; The Teach-To-Learn Initiative: michais/ - k Petros Faoutsos Courses Press Coverage Pubications Research Highights Awards MAGIX Lab Curricuum Vitae Famiy Other Faoutsos on Web. pfa/ - k... EXAMPLE. Q using an R-KwS (searching DBLP database) Author: Christos Faoutsos, Paper: Efficient simiarity search in sequence databases, Author: Rakesh Agrawa. Author: Christos Faoutsos, Paper: Method for high-dimensionaity indexing in a muti-media database, Author: Rakesh Agrawa. Author: Christos Faoutsos, Paper: Quest: A project on database mining, Author: Rakesh Agrawa. EXAMPLE. Q using an R-KwS (searching DBLP database) Author: Christos Faoutsos Author: Michais Faoutsos Author: Petros Faoutsos outsos and Agrawa and their associations through co-authored papers. Exampe iustrates the resut of a traditiona R-KwS for Q on the DBLP database. On the other hand, the R-KwS paradigm may not be very effective when trying to extract information about a particuar data subject (DS), e.g. Faoutsos in Q. Exampe iustrates the R-KwS resut for Q, namey a ranked set of Author tupes containing the Faoutsos keyword, which are the Author tupes corresponding to the three brothers. Evidenty, this resut fais to provide comprehensive information to users about the Faoutsos brothers, e.g. a compete ist of their pubications and other corresponding detais (Certainy, the R-KwS paradigm remains very usefu when trying to combine keywords). In [], the concept of object summary (OS) is introduced; an OS summarizes a data hed in a database about a particuar DS. More precisey, an OS is a tree with the tupe t DS containing the keyword (e.g. Author tupe Christos Faoutsos) as the root node and its neighboring tupes, containing additiona information (e.g. his papers, co-authors etc.), as chid nodes. The resut for Q is in fact a set of OSs: one per DS that incudes a data hed in the database for each Faoutsos brother. Exampe iustrates the OS for Christos (the compete set of papers and the OSs of the other two brothers were omitted due to ack of space). This resut evidenty provides a more compete set of information per brother.

2 EXAMPLE. The OS for Christos Faoutsos Author: Christos Faoutsos.Paper: On Power-aw Reationaships of the Internet Topoogy....Conference:SIGCOMM. Year:....Co-Author(s):Michais Faoutsos, Petros Faoutsos..Paper: An Efficient Pictoria Database System for PSQL....Conference:IEEE Trans. Software Eng. Year:....Co-Author(s):N. Roussopouos, T. Seis Paper: Decustering Using Fractas....Conference:PDIS. Year:. Co-Author(s):Pravin Bhagwat..Paper: Decustering Using Error Correcting Codes....Conference:PODS. Year:. Co-Author(s):Dimitris N. Metaxas. (Tota, tupes) EXAMPLE. The size- OSs for Q and = Author: Christos Faoutsos..Paper: On Power-aw Reationaships of the Internet Topoogy....Conference:SIGCOMM. Year:....Co-Author(s):Michais Faoutsos, Petros Faoutsos...Paper: The QBIC Project: Querying Images by Content, Using, Coor,... Texture and Shape....Conference:SPIE. Year:....Co-Author(s):Carton W. Niback, Dragutin Petkovic, Peter Yanker...Paper: Efficient and Effective Querying by Image Content....Conference:J. Inte. Inf. Syst. Year:....Co-Author(s):N. Roussopouos, T. Seis.... Author: Michais Faoutsos..Paper: On Power-aw Reationaships of the Internet Topoogy....Conference:SIGCOMM. Year:....Co-Author(s):Christos Faoutsos, Petros Faoutsos...Paper: QoSMIC: Quaity of Service Sensitive Muticast Internet Protoco....Conference:SIGCOMM. Year:....Co-Author(s):Anindo Banerjea, Rajesh Pankaj...Paper: Aggregated Muticast with Inter-Group Tree Sharing....Conference:Networked Group Communication. Year:0....Co-Author(s):Aiguo Fei.... Author: Petros Faoutsos..Paper: On Power-aw Reationaships of the Internet Topoogy....Conference:SIGCOMM. Year:....Co-Author(s):Christos Faoutsos, Michais Faoutsos...Paper: Composabe controers for physics-based character animation....conference:siggraph. Year:0....Co-Author(s):Michie van de Panne, Demetri Terzopouos...Paper: The virtua stuntman: dynamic characters with a repertoire of... autonomous motor skis....conference:computers & Graphics. Year:0....Co-Author(s):Michie van de Panne, Demetri Terzopouos. From Exampe, we can observe that some of the OSs may be very arge in size; e.g. Christos Faoutsos has co-authored many papers and his OS consists of, tupes. This is not ony unfriendy to users that prefer a quick gance first before deciding which Faoutsos they are reay interested in, but aso expensive to produce. Therefore, a partia OS of size, composed of ony representative and important tupes, may be more appropriate. In this paper, we investigate in detai the effective and efficient generation of size- OSs. Exampe iustrates Q with = on the DBLP database; namey a set of size- OSs composed of ony important tupes for each DS. From the user s perspective, the semantics of this paradigm resembe more a W-KwS rather than a R-KwS. For instance, the compete OS of Exampe resembes a web page (as they both incude comprehensive information about a DS), whereas the size- OSs of Exampe resembe the snippets of Exampe. Therefore, users with W-KwS experience wi potentiay find it friendier and aso coser to their expectations. OSs and size- OSs can have many appications. For exampe, OSs can automate responds to data protection act (DPA) subject access requests (e.g. the US Privacy Act of, UK DPA of and [] etc.). According to DPA access requests, DSs have the right to request access from any organization to persona information about them. Thus, data controers of organizations must extract data for a given DS from their databases and present it in an inteigibe form []. Another appication is for inteigent services searching information about suspects from various databases. Hence, size- OSs can aso be very usefu as they enhance the usabiity of OSs. In genera, a size- OS is a concise summary of the context around any pivot database tupe, finding appication in (interactive) data exporation, schema extraction, etc. We shoud effectivey generate a stand-aone size- OS, composed of important tupes ony, so that the user can comprehend it without any additiona information. A stand-aone size- OS shoud preserve meaningfu and sef-descriptive semantics about the DS. As we expain in Section, for this reason, the tupes shoud form a connected graph that incudes the root of the OS (i.e. t DS ). To distinguish the importance of individua tupes t i to be incuded in the size- OS, a oca importance score (denoted as Im(OS, t i)) is defined by combining the tupe s goba importance score in the database (denoted as Im(t i)) and its affinity [] in the OS (denoted as Af(t i)). Based on the oca importance scores of the tupes of an OS, we can find the partia OS of size with the maximum importance score, which incudes tupes that are connected with t DS. The efficient size- generation of OSs is a chaenging probem. A brute force approach, that considers a candidate size- OSs before finding the one with the maximum importance, requires exponentia time. We propose an optima agorithm based on dynamic programming, which is efficient for sma probems, however, it does not scae we with the OS size and. In view of this, we design three practica greedy agorithms. We provide an extensive experimenta study on DBLP and TPC- H databases, which incudes comparisons of our agorithms and verifies their efficiency. To verify the effectiveness of our framework, we coected user feedback, e.g. by asking severa DBLP authors (i.e. the DSs themseves) to assess the computed size- OSs of themseves on the DBLP database. The users suggested that the resuts produced by our method are very cose to their expectations. The rest of the paper is structured as foows. Section describes background and reated work. Section describes the semantics of size- OS keyword queries and formuates the probem of their generation. Sections and introduce the optima and greedy agorithms respectivey. Section presents experimenta resuts and Section provides concuding remarks.. BACKGROUND AND RELATED WORK In this section, we first describe the concept of object summaries (OSs), which we buid upon in this paper. We then present and compare other reated work in R-KwS, ranking and summarization. To the best of our knowedge there is no previous work that focuses on the computation of size- OSs.. Object Summaries In the context of OS search in reationa databases [, ], a query is a set of keywords (e.g. Christos Faoutsos ) and the resut is a set of OSs. An OS is generated for each tupe (t DS ) found in the database that contains the keyword(s) as part of an attribute s vaue (e.g. tupe Christos Faoutsos of reation Author in the DBLP database). An OS is a tree structure composed of tupes, having t DS as root and t DS s neighboring tupes (i.e. those associated through foreign keys) as its chidren/descendants. In order to construct OSs, this approach combines the use of graphs and SQL. The rationae is that there are reations, denoted as R DS (e.g. the Author reation), which hod information about the queried Data Subjects (DSs) and the reations inked around R DS s contain additiona information about the particuar DS. For each R DS, a Data Subject Schema Graph (G DS ) can be generated; this is

3 Conference Year Paper Figure : The DBLP Database Schema PaperCiteBy(0.)., 0 Author ().0,. Paper (0.).,. PaperCites (0.)., 0 Year (0.) 0., 0. Conference(0.) 0., 0 Author Co-author (0.) 0., 0 Figure : The DBLP Author G DS (Annotated with (Affinity), max(r i) and mmax(r i)) a directed abeed tree that captures a subset of the database schema with R DS as a root. (Figures and iustrate the schemata of the DBLP and TPC-H databases whereas Figures and iustrate exempary G DS s.) Each reation in G DS is aso annotated with usefu information that we describe ater, such as affinity and importance. G DS is a treeaization of the schema, i.e. R DS becomes the root, its neighboring reations become chid nodes and aso any ooped or many-to-many reationships are repicated. Exampes of such repications are reations PaperCitedBy, PaperCites and Co- Author on Author G DS and reations Partsupp, Lineitem, Parts etc. on Customer G DS (see G DS s in Figures and ). (User evauation in [] verified that the tree format (achieved via such repications) increases significanty friendiness and ease of use of OSs.) The chaenge now is the seection of the reations from G DS which have the highest affinity with R DS ; these need to be accessed and joined in order to create a good OS. To faciitate this task, affinity measures of reations (denoted as Af(R i)) in G DS are investigated, quantified and annotated on the G DS. The affinity of a reation R i to R DS can be cacuated using the foowing formua: Af(R i) = j m jw j Af(R Parent), () where j ranges over a set of metrics (m, m,..., m n), their corresponding weights (w, w,..., w n) and Af(R Parent) is the affinity of R i s parent to R DS. Affinity metrics between R i and R DS incude () their distance and () their connectivity properties on both the database schema and the data-graph (see [] for more detais). Given an affinity threshod θ, a subset of G DS can be produced, denoted as G DS (θ). Finay, by traversing G DS (θ) (e.g. by joining the corresponding reations) we can generate the OSs (either by using the precomputed data-graph or directy from the database using Agorithm ). More precisey, a breadth-first traversa of the corresponding G DS (θ) with the t DS tupe as the initia root entry of the OS tree is appied. For instance, for keyword query Q, Author G DS of Figure and θ=0. the report presented in Exampe wi automaticay be generated. Note that Author G DS (0.) incudes a reations whist Customer G DS (0.) incudes ony Customer, Nation, Region, Order, Lineitem and Partsupp reations (since a these reations have affinity greater than 0.). Simiary, the set of attributes A j from each reation R i that are incuded in a G DS are seected by empoying an attributes affinity and a threshod (i.e. θ ). For exampe, in a Customer OS, Comment is excuded from Partsupp reation as it is not reevant to Customer DSs.. R-KwS and Ranking R-KwS techniques faciitate the discovery of joining tupes (i.e. Minima Tota Join Networks of Tupes (MTJNTs) []) that coectivey contain a query keywords and are associated through their keys; for this purpose the concept of candidate networks is introduced; see, for exampe, DISCOVER [], BANKS [, ]. The OSs paradigm differs from other R-KwS techniques semanticay, since it does not focus on finding and ranking candidate networks that connect the given keywords, but searches for OSs, which are trees centered around the data subject described by the keywords. Précis Queries [, ] resembe size- OSs as they append additiona information to the nodes containing the keywords, by considering neighboring reations that are impicity reated to the keywords. More precisey, a précis query resut is a ogica subset of the origina database (i.e. a subset of reations and a subset of tupes). For instance, the précis of Q is a subset of the database that incudes the tupes of the three Faoutsos Authors and a subset of their (common) Papers, Co-Authors, Conferences, etc. In contrast, our resut is a set of three separate size- OSs (Exampe ). A thorough evauation between OSs and précis appears in our earier work []. R-KwS techniques aso investigate the ranking of their resuts. Such ranking paradigms consider: ) IR-Stye techniques, which weight the amount of times keywords (terms) appear in MTJNs [,,, ]. However, such techniques miss tupes that are reated to the keywords, but they do not contain them []; e.g. for Q, tupes in reation Papers aso have importance athough they do not incude the Faoutsos keyword. ) Tupes Importance, which weights the authority fow through reationships, e.g. ObjectRank [], [], VaueRank [], PageRank [], BANKS (PageRank inspired) [], [], XRANK [] etc. In this paper we use tupes importance to mode goba importance scores and more precisey goba ObjectRank (for DBLP) and VaueRank (for TPC-H). (Note that our agorithms are orthogona to how tupe importance is defined and other methods coud aso be investigated.) ObjectRank [] is an extension of PageRank on databases and introduces the concept of Authority Transfer Rates between the tupes of each reation of the database (Authority Transfer Rates are annotated on the so caed Authority Transfer Schema Graph, denoted as G A, e.g. Figure ). They are based on the observation that soey mapping a reationa database to a graph (as in the case of the web) is not accurate and a G A is required to contro the fow of authority in neighboring tupes. For instance, we cited papers shoud have higher importance than papers citing many other papers or a we cited paper shoud have better ranking than another one with fewer citations. VaueRank is an extension of ObjectRank which aso considers the tupes vaues and thus can be appied on any database (e.g. TPC-H) in contrast to ObjectRank which is mainy effective on authoritative fow data such as bibiographic data (e.g. DBLP). For instance, in trading databases, a customer with five orders of vaues $ may get ower importance than another customer with three orders of vaues $0.. Other Reated Work Document summarization techniques have attracted significant research interest [, ]. In genera, these techniques are IR-stye inspired. Web snippets [] are exampes of document summaries that accompany search resuts of W-KwSs in order to faciitate their quick preview (e.g. see Exampe ). They can be either static (e.g.

4 composed of the first words of the document or description metadata) or query-biased (e.g. composed of sentences containing many times the keywords) []. Sti, the direct appication of such techniques on databases in genera and OS in particuar is ineffective; e.g. they disregard the reationa associations and semantics of the dispayed tupes. For exampe consider Q and Exampe, papers authored by Faoutsos (athough don t incude the Faoutsos keyword) have importance anaogous to their citations and authors; this is ignored by document summarization techniques. XML keyword search techniques, simiary to R-KwSs, faciitate the discovery of XML sub-trees that contain a query keywords (e.g. Faoutsos + Agrawa ). Anaogousy, XML snippets [] are sub-trees of the compete XML resut, with a given size, that contain a keywords. An apparent difference between size- OSs and XML snippets is their semantics which is anaogous to the semantic difference between compete OSs and XML resuts. Therefore, their generation is a competey different probem. An interesting simiarity is that both size- OS and XML snippets are sub-trees of the corresponding compete resuts, hence composed of connected nodes. This common property is for the same reason, i.e. to preserve sef-descriptiveness.. SIZE- L OS A size- OS keyword query consists of () a set of keywords and () a vaue for (e.g. Q and =) and the resut comprises a set of size- OSs. A good size- OS shoud be a stand-aone and meaningfu synopsis of the most important information about the particuar DS. DEFINITION. Given an OS and an integer size, a candidate size- OS is any subset of the OS composed of tupes, such that a tupes are connected with t DS (i.e. the root of the OS tree). Definition guarantees that the size- OS remains stand-aone, (so users can understand it as it is without any additiona tupes); i.e. by incuding connecting tupes we aso incude the semantics of their connection to the DS. (Reca that this criterion was aso used in [] for the same reasons.) Consider the exampe of Figure which is a fraction of the Faoutsos OS (in the DBLP database). Even, if the Paper Efficient and Effective Querying by Image Content has ess oca importance (e.g. ) than the Co-Author(s) Seis (e.g. ) and Roussopouos (e.g. ), we cannot excude the Paper and incude ony the Co-Authors. The rationae is that by excuding the Paper tupe we aso excude the semantic association between the Author and Co-Author(s), which in this case is their common paper. Aso note that a size- OS wi not necessariy incude the tupes with the argest importance scores. For exampe, the Co-Author Roussopouos, athough with arger importance than the particuar Paper, may have to be excuded from a size- OS (e.g. from a size- OS which wi consist of () Author Faoutsos, () Paper Efficient... and () Co-Author Seis ). Given an OS, we can extract exponentiay many size- OSs that satisfy Definition. In the next section we define a measure for the importance (i.e., quaity) of a candidate size- OS. Our goa then woud be to retrieve a size- OS of the maximum possibe quaity.. Importance of a Size- OS The (goba) importance of any candidate size- OS S, denoted as Im(S), is defined as: Im(S) = t i S Im(OS, t i), () where Im(OS, t i) is the oca importance of tupe t i (to be defined in Section. beow). We say that a candidate size- OS... Co-Author: T. Seis Author: Christos Faoutsos Paper: Efficient and Effective Querying by image Content... Co-Author: N. Roussopouos Year: Conference: IEEE Trans. Software Eng. Figure : A Fraction of the Faoutsos OS (Annotated with Loca Importance) is an optima size- OS, if it has the maximum Im(S) (denoted as max(im(s))) over a candidate size- OSs for the given OS. Wherever an optima size- OS is hard to find, we target the retrieva of a sub-optima size- OS of the highest possibe importance.. Loca Importance of a Tupe (Im(OS, t i )) The oca importance of Im(OS, t i) of each tupe t i in an OS can be cacuated by: Im(OS, t i) = Im(t i) Af(t i), () where Im(t i) is the goba importance of t i in the database. We use goba ObjectRank and VaueRank to cacuate goba importance, as discussed in Section.. Af(t i) is the affinity of t i to the t DS ; namey the affinity Af(R i) of the corresponding reation R i where t i beongs, to R DS. This can be cacuated from G DS using Equation, as discussed in Section. (aternativey, a domain expert can set Af(R i)s manuay). For exampe, if tupe t i is paper Efficient.. with Im(t i)=. and Af(t i)=af(r P aper)=0. (see the affinity on Author G DS in Figure ), then Im(OS, t i)=.*0.=. Mutipying goba importance Im(t i) with affinity Af(t i) reduces the importance of tupes that are not cosey reated to the DS. For instance, athough paper Efficient.. and year have equa goba importance scores (. and., respectivey), their oca importance scores become (=.*0.) and (=.*0.) respectivey. The use of importance and affinity metrics is inspired by other earier work; e.g. XRANK and précis empoy variations of importance and affinity [, ]. For defining affinity in [, ], ony distance is considered; however, as it is shown in [] distance is ony one among the possibe affinity metrics (e.g. cardinaity, reverse cardinaity etc.).. Probem Definition The generation of a compete OS is straightforward: we ony have to traverse the corresponding G DS (see Agorithm in the Appendix). The generation of a size- OS is a more chaenging task because we need to seect tupes that are connected to the t DS of the tree and at the same time resut to the argest Im(S). Hence, the probem we study in this paper can be defined as foows: PROBLEM (FIND AN OPTIMAL SIZE- OS). Given a t DS, the corresponding G DS and, find a size- OS S of maximum Im(S). A direct approach for soving this probem is to first generate the compete OS (i.e. Agorithm ) and then determine the optima In fact, any tupes or subtrees, which have distance at east from the root t DS are excuded from the OS, as these cannot be part of a connected size- OS rooted at t DS.

5 size- OS from it. In Section, we propose a dynamic programming (DP) agorithm for this purpose. If the compete OS is too arge, soving the probem exacty using DP can be too expensive. In view of this, in Section, we propose greedy agorithms that find a sub-optima synopsis. In order to further reduce the cost of finding a sub-optima soution, in Section., we aso propose an economica approach, which, instead of the compete OS, initiay generates a preiminary partia OS, denoted as preim- OS. The rationae of a preim- OS is to avoid the extraction and consequenty further processing of fruitess tupes that are not promising to make it in the size- OS. DP and the greedy agorithms can be appied on the preim- OS to find a good sub-optima size- OS.. THE DP ALGORITHM This section describes a dynamic programming (DP) agorithm, which, given an OS, determines the optima size- OS in it. The OS is a tree, as discussed in Section. Every node v of the OS tree is a tupe t i, and carries a weight w(v), which is the oca importance Im(OS, t i) of the corresponding tupe t i. Given the tree OS, our objective is to find a subtree S opt, such that (i) S opt incudes the root node t DS of OS, (ii) the tree has nodes, and (iii) its nodes have the maximum sum of weights for a trees that satisfy (i) and (ii). In the third condition, the sum of node weights corresponds to Im(S opt), according to Equation. Since this is the maximum among a quaifying subtrees, S opt is the optima size- OS. Assume that the root t DS in S opt has a chid v and the subtree S v opt rooted at v has i nodes. Then, S v opt shoud be the optima sizei OS rooted at v. DP operates based on exacty this assertion; for each candidate node v to be incuded in the optima synopsis and for each number of nodes i in the subtree of v that can be incuded, we compute the corresponding optima size-i synopsis and the corresponding sum of weights. The optima size-i synopsis rooted at v is computed recursivey from precomputed size-j synopses (j < i) rooted at v s chidren; to find it, we shoud consider a synopses formed by v and a size-(i ) combinations of its chidren and subtrees rooted at them. Specificay, et d(v) be the depth of a node v in OS (the root t DS has depth 0). The subtree rooted at d(v) can contribute at most d(v) nodes to the optima soution, because in every soution that incudes v, the compete path from the root to v must be incuded (due to the fact that t DS shoud be incuded and the soution must be connected). The construction of the DP agorithm is to compute for each node v of OS S v,i: the optima size-i OS for a i [, d(v)], in the subtree rooted at v. In addition to S v,i the agorithm shoud track W (S v,i), the sum of weights of a nodes in S v,i. DP (Agorithm ) proceeds in a bottom-up fashion; it starts from nodes in OS at depth ; these nodes can ony contribute themseves in the optima soution (nodes at depth at east cannot participate in a size- OS). For each such node v, triviay S v,=v, W (S v,)=w(v). Now consider a node v at depth k<. Upon reaching v, for a chidren u of v, quantities S u,i and W (S u,i) have been computed for a i [, d(v) ]. Let us now see how we can compute S v,i for each i [, d(v)]. First, each S v,i shoud incude v itsef. Then, we examine a possibe combinations of v s chidren and number of nodes to be seected from their subtrees, such that the tota number of seected nodes is i. We do not have to check the subtrees of v s chidren, since for each number of nodes j to be seected from a subtree rooted at chid u, we aready have the optima set S u,j and the corresponding sum of weights W (S u,j). Note that when we reach the OS root r =t DS, we ony have to compute S r, : the optima size- OS (i.e., there is no need to compute S r,i for i [, ]). Agorithm The Optima Size- OS (DP) Agorithm DP(, t DS, G DS ) : OS Generation(t DS, G DS ) generates the compete OS, annotates with oca importance each node : for each node v at depth do set S v, = v : for each depth k = to 0 do : for each node v at depth k do : for i= to d(v) do : S v,i = {v} the best combination of v s chidren and nodes from them such that the tota number of nodes is i : return S r, Depth Computed Sets S, =, S, = S, =, S, =, S, =, S, =, S, =, S, ={,}, S, =, S, ={,} S, =, S, =, S, ={,}, S, ={,,}, S, =, S, ={,}, S, ={,,}, S, =, S, =, S, ={,}, S, ={,,} 0 S, ={,,,} Figure : Exampe: Steps of DP As an exampe, consider the OS shown in Figure (top) and assume that we want to compute the optima size- OS from it. The tabe shows the steps of DP in computing the optima sets S v,i in a bottom-up fashion, starting from nodes and which are at depth (i.e. ). For exampe, to compute S,={,,}, we compare the two possibe cases S, = {} S, S, and S,={} S, since S, S, and S, are the ony combinations sets from node s chidren that tota to nodes (i =). S, S,={,} with tota weight and S,={,} with tota weight 0. Thus, S, = {} S,={,,}. Note that for nodes that do not have enough chidren, the number of sets that are computed coud be smaer than those indicated in the pseudocode. For exampe, for node, we ony have S,; i.e. S, and S, do not exist athough the node is at depth, because node has no chidren. In addition, for the root node, DP ony has to compute S,, since we ony care about the optima size- summary (there are no nodes above the root that coud use smaer summaries). In terms of compexity, we need to compute for each node v in the OS up to depth up to d(v) sets. For each set we need to find the optima combination of chidren and nodes from them to choose. This cost of choosing the best combination increases exponentiay with i, which is O(). Thus, the overa cost of DP is O(n ) for an input OS of size n, as can be verified in our experiments. This is essentiay the compexity of the probem as DP expores a possibe summaries systematicay and, in the genera case, there is no way to prune the search space. For arge vaues of, DP becomes impractica and we resort to the greedy heuristics described in the next section. Finay, the foowing emma proves the optimaity of DP. LEMMA. Agorithm computes the optima size- OS.

6 PROOF. The optima size- OS S opt incudes the root t DS of the OS and a set of subtrees rooted as some of t DS s chidren. DP tests a possibe combinations of chidren and numbers of nodes from the corresponding subtrees, therefore the combination that corresponds to S opt wi be considered. For the specific combination, for each chid v and number of nodes i, the optima subtree rooted at v with i nodes (i.e., S v,i) has aready been found during the bottom-up computation process of DP. Therefore, DP wi seect and output the optima combination (which has the argest importance among a tested ones). Agorithm The Bottom-Up Pruning Size- Agorithm Bottom-Up Pruning Size- (, t DS, G DS ) : OS Generation(t DS, G DS ) generates initia size- (i.e. compete or preim-) OS and initia P Q : whie ( size- OS > ) do : t tem=dequeue(p Q) the smaest vaue from P Q : if!(hassibings(size- OS, t tem)) then : enqueue(p Q, parent(size- OS, t tem)) check whether after pruning t tem, its parent becomes a eaf node : prune t tem from size- OS : return size- OS. GREEDY ALGORITHMS Since the DP agorithm does not scae we, in this section, we investigate greedy heuristics that aim at producing a high-quaity size- OS, not necessariy being the optima. A property that the agorithms expoit is that the oca importance of tupes in the OS (i.e. Im(OS, t i)) usuay decreases with the node depth from the root t DS of the OS. Reca that Im(OS, t i) is the product Im(t i) Af(t i), where Im(t i) is the goba importance of tupe t i and Af(t i) is the affinity of the reation that t i beongs to. Af(t i) monotonicay decreases with the depth of the tupe since Af(R i) is a product of its parent s affinity and Af(R i) (cf. Equation ). On the other hand, the goba importance for a particuar tupe is to some extent unpredictabe. Therefore, even though the oca importance is not monotonicay decreasing with the depth of the tupe on the OS tree, it has higher probabiity to decrease than to increase with depth. Hence, it is more probabe that tupes higher on the OS to have greater oca importance than ower tupes. Moreover, note that due to the non-monotonicity of OSs, existing top-k techniques such as [,, ] cannot be appied.. Bottom-Up Pruning Size- Agorithm This agorithm, given an initia OS (either a compete or a preim OS) iterativey prunes from the bottom of the tree the n eaf nodes with the smaest Im(OS, t i), where n is the number of n- odes in the compete OS. The rationae is that since tupes need to be connected with the root and ower tupes on the tree are expected to have ower importance, we can start pruning from the bottom. A priority queue (P Q) organizes the current eaf nodes according to their oca importance. Agorithm shows a pseudocode of the agorithm and Figure iustrates the steps. More precisey, this agorithm firsty generates the initia OS (ine ; e.g. the compete OS using Agorithm ). The OS Generation agorithm generates the initia size- OS and aso the initia P Q (initiay hoding a eaves of the given OS). Then, the agorithm iterativey prunes the eaves with the smaest Im(OS, t i). Whenever a new eaf is created (e.g., after pruning node in Figure, node becomes a eaf), it is added to P Q. The agorithm terminates when ony nodes remain in the tree. The tree is then returned as the size- OS. In terms of time compexity, the agorithm performs O(n) deete operations in constant time, each potentiay foowed by an update to the P Q. Since there are O(n) eements in P Q, the cost of each update operation is O(ogn). Thus, the overa cost of the agorithm is O(nogn). This is much ower than the compexity of the DP agorithm, which gives the optima soution. On the other hand, this method wi not aways return the optima soution; e.g. the optima size- OS shoud incude nodes,,, and instead of,,, and (Fig (d)). In practice, it is very accurate (see our experimenta resuts in Section.), due to the aforementioned property of Im(OS, t i), which gives higher probabiity to nodes coser to the root to have a high oca importance. Lemma proves an optimaity condition for this agorithm PQ (a) The initia OS PQ (c) The size- OS PQ (b) First eaf pruned out PQ (d) The size- OS Figure : The Bottom-Up Pruning Size- Agorithm: Size- OSs and their Corresponding P Qs (annotated with tupe ID and oca importance) (Paper OSs in the DBLP database are an exampe of this condition; to be discussed in Section.). LEMMA. When the nodes of an OS have monotonicay decreasing oca importance scores to their distance from the root (i.e. the score of each parent is not smaer than that of its chidren), then the Bottom-Up Pruning Size- Agorithm returns the optima size- OS. PROOF. P Q.top aways hods the node with the current smaest score in the OS. This is because P Q.top is by definition the smaest among eaf nodes, where eaf nodes aways have smaer scores than their ancestors. Therefore, by removing the n current smaest vaues (iterativey stored in P Q.top) from an OS, we can get the optima size- OS.. Update Top-Path- Agorithm We now expore a second greedy heuristic. This agorithm iterativey seects the path p i of tupes with the argest average importance per tupe (denoted as AI(p i)), adds p i to the size- OS and removes the nodes of p i from the OS and updates AI(p i) for the remaining paths accordingy. The rationae of seecting the path of tupes (instead of the tupe) with the current argest importance, is

7 Agorithm The Update Top-path- Agorithm Update Top-path-(, t DS, G DS ) : OS Generation(t DS, G DS ) generates initia size- (i.e. compete or preim-) OS, annotates tupes with AI(p i )) : whie ( size- OS < ) do : p i =path with max AI(p i ) : add first size- OS nodes of p i to size- OS : if ( size- OS < ) then : remove seected path p i from the tree : for each chid v of nodes in p i do : update AI(p i ) for each node t j in the subtree rooted at v : return size- OS 0 (a) The initia OS (c) Second update (b) First update (d) Fina update (size- OS) Figure : The Update Top-Path- Agorithm: The size- OS (annotated with tupe ID, oca importance and AI(p i); seected nodes are shaded) that since a nodes need to be connected and monotonicity may not hod, we faciitate the seection of nodes of arge importance even though their ancestors may have ower importance. Agorithm is a pseudocode of the heuristic and Figure iustrates an exampe. More precisey, this agorithm (ike the Bottom-Up Pruning Agorithm) firsty generates the compete (or aternativey the preim) OS. During the OS generation, for each tupe t i, we aso cacuate the importance per tupe AI(p i) for the corresponding path p i from the root to t i. We then seect the node with the argest AI(p i) and add the corresponding path to the size- OS. By removing the nodes of p i from the OS, the tree now becomes a forest; each chid of a node in p i is the root of a tree. Accordingy, the AI(p i) for each node t i is updated again to disregard the removed nodes in the path seected at the previous step. The process of seecting the path with the highest AI(p i), adding it to the size- OS is repeated as ong as ess than nodes have been seected so far. If ess than p i nodes are needed to compete the size- OS then ony the top nodes of the path are added to the size- OS (because ony these nodes are connected to the current size- OS). Consider the exampe shown in Figure. Node has AI(p i)=, because its path incudes nodes and with average Im(OS, t i) being (+)/=. Assuming =, at the first oop, the agorithm seects nodes and with the argest AI(p i), i.e.. Then, the nodes aong the path (nodes and ) are added to the size- OS. For the remaining nodes, AI(p i) is updated to disregard the removed nodes (see top-right tree in Figure ). For exampe, the new AI(p i) for node is, because its path now incudes ony nodes and with average Im(OS, t i) being. The next path to be seected is that ending at node, which adds two more nodes to the snippet. Finay, node is added to compete the size- OS. The compexity of the agorithm can be as high as O(n), where n is the size of the compete OS, as at each step the agorithm may choose ony one node which causes the update of O(n) paths. The agorithm can be optimized if we precompute for each node v of the tree the node s(v) with the highest AI(p i) in the subtree rooted at v. Regardess of any change at any ancestor of v, s(v) shoud remain the node with the highest AI(p i) in the subtree (because the change wi affect a nodes in the subtree in the same way). Thus, ony a sma number of comparisons woud be needed after each path seection to find the next path to be seected. Specificay, for each chid v of nodes in the currenty seected path p i, we need to update AI(p i) for s(v) and then compare a s(v) s to pick the one with the argest AI(p i). In terms of approximation quaity, this agorithm not aways returns the optima soution; e.g. the size- OS wi have nodes, and instead of, and. However, empiricay, this method gives better resuts than Bottom-Up Pruning.. Top- Preim- OS Preprocessing Instead of operating on the compete OS, which may be expensive to generate and search, we propose to work on a smaer OS, which hopefuy incudes a good size- OS. We denote such a preiminary partia OS as preim- OS (with size j where j OS ). On the preim- OS, we can appy any of the proposed agorithms so far (of course, DP is not expected to return the optima resut, uness the preim- OS is guaranteed to incude it). The rationae of the preim- OS is to avoid extraction and processing of tupes that are not promising to make it in the optima size- OS. Agorithm is a pseudocode for computing the preim- OS, Tabe summarizes symbos and definitions and Figure iustrates an exampe. Determining a preim- OS that incudes the optima size- OS can be very expensive, therefore we propose a heuristic, which produces a preim- OS that incudes at east the nodes of the compete OS with the argest oca importance (denoted as top- set). Figure (a) iustrates such a preim- OS. Using avoidance conditions and simpe statistics that summarize the range of oca importance of every tupe in each reation (e.g. max(r i)) we can infer upper bounds for the oca importance of tupes and thus safey predict whether a candidate path can potentiay produce usefu tupes. DEFINITION. Given an OS and an integer, a top- preim OS (or simpy preim- OS) is a subset of the compete OS that incudes the tupes of the OS with the argest oca importance. We annotate each reation R i on the G DS graph with the statistics max(r i) and mmax(r i) (see Figure ). (Reca from Section. that we generate G DS graphs for every reation that may contain information about DSs.) max(r i) is the maximum oca importance of a tupes in R i, which can be derived from the maximum goba importance in R i (a goba statistic that is computed/updated independenty of the queries) and the affinity of Af(R i). mmax(r i) is the maximum oca importance of a tupes that beong to R i s descendant reation nodes in G DS (i.e. the max j{max(r j)}; j ranges over a such reations) or 0 if R i has no descendants (eaf node). The agorithm for generating the preim- OS is an extension of the compete OS generation agorithm (e.g. Agorithm ). The extension incorporates pruning conditions in order to avoid adding to the preim- OS fruitess tupes and their subtrees. More precisey,

8 Tabe : Symbos and Definitions (Top- Preim- OSs) Symbos Definition top- The nodes with the argest oca importance in the OS top- P Q An -sized priority queue with the current argest oca importance of extracted tupes argest- The tupe with the th argest oca importance retrieved so far (i.e. the smaest vaue of top- P Q) or 0 if top- P Q < i(t i ) The oca importance of tupe t i (i.e. Im(OS, t i )) R(t i ) The reation on G DS that tupe t i beongs to R i (t j ) The subset of R i that joins with tupe t j max(r i ) The maximum vaue of oca importance of R i mmax(r i ) The maximum vaue of max(r i ) of a R i s descendants nodes on G DS or 0 if R i has no descendants (eaf node) fruitess A tupe not in top- tupe fruitess A G DS sub-tree starting from reation R i is considered G DS fruitess for a given argest-, if none tupes from R i and reation/ its descendants can be fruitfu for the top-(i.e. when sub-tree argest- max(r i ) AND argest- mmax(r i )) fruitfu- A reation R i is considered fruitfu- for a given argest, if ony up to nodes from the corresponding R i (t j ) reation can be fruitfu for the top-, (i.e. when argest mmax(r i )) we traverse the G DS graph in a breath first order. Every extracted tupe is appended to the preim- OS (ines and ) and to queue Q (to faciitate the breadth first traversa of the G DS ; see ines and ). Let argest- be the tupe with the th argest oca importance retrieved so far. If the current tupe t i is greater than argest-, t i is added to the -sized priority queue top- P Q as we (in order to update the top- set; ines and ). Largest- is set to the current smaest vaue of top- P Q or to 0 if the top- P Q does not contain vaues yet (ines -). We traverse the G DS as foows. For each tupe de-queued from the queue Q (ine ), we extract a its chid nodes from each corresponding chid reation (ines -) and we empoy the foowing avoidance conditions: Avoidance Condition (Avoiding fruitess G DS sub-trees): If the top- P Q aready contains tupes and argest- is greater than or equa to the oca importance of a tupes of the current reation R i and a its descendants (i.e. argest- max(r i) AND argest mmax(r i)), then there is no need to traverse the sub-tree starting at R i (ine ). In such cases, we say that the sub-tree starting from R i is fruitess. For instance, consider the exampe of Figure ; whie retrieving tupe y, argest-=0. and the current chid reation R i is Conference with max(r i)=0. and mmax(r i)=0. Thus, we can safey infer that Conference has no fruitfu tupes for the particuar preim- OS. This avoidance condition does not require any I/O operations as a information required can cheapy be obtained from the annotated G DS. Avoidance Condition (Limiting up to tupe extractions from fruitfu- reations): Assume that we are about to traverse R i in order to extract R i(t j): the tupes in R i which join with the parent tupe t j. We can imit the amount of tupes returned by this join up to, if we can safey predict that none of their descendants (if any) can be fruitfu for the top-. We say a reation R i on the G DS is considered fruitfu- for a given argest-, if we can safey predict that ony up to tupes from R i can be fruitfu for the top- and none of their descendants (if any); this is the case when argest- mmax(r i) but argest-<max(r i). In other words, we can safey extract ony up to tupes greater than the argest- from a fruitfu- reation; i.e. there is no need to compute the compete join. For instance consider the exampe of Figure, where we are about to traverse the fruitfu- reation PaperCitedBy (a eaf node on the G DS, thus a fruitfu- reation) in order to extract the joins with Paper tupe p. Then, we can extract from the database ony up to Agorithm The Preim- OS Generation Agorithm Preim- OS Generation (, t DS, G DS ) : argest-=0 : add t DS as the root of the preim- : enqueue(q, t DS ) : enqueue(top- P Q, t DS ) : whie!(isemptyqueue(q)) do : t j =dequeue(q) : for each chid reation R i of R(t j ) in G DS do : if!(argest- max(r i ) AND argest- mmax(r i )) then Av. Cond. : if (argest- mmax(r i )) then : R i (t j )= SELECT * TOP FROM R i WHERE (t j.id=r i.id AND R i.i >argest-) Av. Cond.. t j.id and R i.id represent the keys that t j and R i join and R i.i the oca import. attribute of R i : ese : R i (t j )= SELECT * FROM R i WHERE (t j.id=r i.id) : for each tupe t i of R i (t j ) do : add t i on preim- as chid of t j : enqueue(q, t i ) : if (i(t i )>argest-) then : enqueue(top- P Q, t i ) : if ( top- P Q >) then : dequeue(top- P Q) : if ( top- P Q <) then : argest-=0 : ese : argest-=smaest(top- P Q) tupes with oca importance greater than the argest- (which is 0, since top- P Q <). Simiary, when traversing the fruitfu- reation PaperCites with argest-=0., we extract up to tupes arger than argest-. Note that the Paper reation is not fruitfu-, since argest-=0 and mmax(r Paper)=. thus argest-<mmax(r Paper). As a consequence, we cannot appy this avoidance condition and hence we need to extract a tupes for Paper. Note, that this condition has no impact on M: reationships since the maximum cardinaity of R i(t j) is anyway. In terms of cost, in the worst case we need up to n I/O accesses (if operating directy on the database), where n is the amount of nodes in the compete OS, even if we extract ony j tupes (reca that Avoidance Condition sti requires an I/O access even when it returns no resuts). In practice, however, there can be significant savings if the top- tupes are found eary and arge subtrees of the compete OS are pruned. The preim- OS created according to Definition does not essentiay contain the optima size- OS, e.g. the preim- OS of our exampe does not contain the ca node which beongs to the optima size- OS. In practice, we found that in most cases the preim- OS did contain the optima soution. This means that a size- OS computation agorithms may give the same resuts when appied either on the preim- or compete OS. The foowing emma proves that if monotonicity hods then the preim OS wi certainy incude the optima size- OS. LEMMA. When the nodes of an OS have monotonicay decreasing oca importance scores to their distance from the root, then the preim- OS contains the optima size- OS. PROOF. When monotonicity hods, the optima size- OS is the top- set (as shown by Lemma ). Therefore, the preim- OS produced by this agorithm that contains the top- set is optima. Finay, we note that we have aso investigated a variant of the preim- OS, which incudes the argest top-path- nodes (rather than the top-), namey the tupes with the argest AI(p i). However, this approach did not resut to better time or approximation quaity so we do not further discuss it.

9 pb. pb. pc. p. pc. y. c. ca. a. ca. pc. pc. pc. p. y.0 c. ca. (a) The compete OS, the preim- OS and the top- set. Nodes with ow transparency are pruned tupes (e.g. pc, ca etc.), shaded nodes are the top- set (e.g. a, pc etc.) and the rest are the remaining tupes of the preim- OS (e.g. p, p etc.) t j R i R i(t j) Q top- P Q a a 0. ca. arge st- a a Paper p, p p, p p p 0... Paper a p pb, pb p pb p pb p,pb,pb 0. CitedBy..... Paper pc p p,pb,pb a pc pb p pb Cites (Av. Cond. ) pc y p Year p,pb,pb a pc y pb p (Av. Cond. ) pc, y p Co- ca p,..., y a ca pc y pb 0. Author (Av. Cond. ) ca Confe ca y, y y ca a ca pc rence (Av. Cond. ) ca y ca y, ca ca a ca pc (eaf) (ca is eaf) Confe y y ca ca a ca pc rence (Av. Cond. ) y ca ø ca a ca pc (eaf) (ca is eaf) (b) Vaues of t j, R i, R i(t j), Q, top- P Q and argest- during the preim- OS generation Figure : The Preim- OS Generation Agorithm (=, t DS =a, and G DS =G Author ). EXPERIMENTAL EVALUATION In this section, we experimentay evauate the proposed size- OS concept and agorithms. We evauate our agorithms using both compete and preim- OSs. First, the effectiveness of the proposed size- OSs is thoroughy investigated with the hep of human evauators. Then, the quaity of the size- OSs produced by the greedy heuristics is compared to that of the corresponding optima OSs. Finay, the efficiency of agorithms is comparativey investigated. We used two databases: DBLP and TPC-H (we used scae factor in generating the TPC-H dataset). The two databases have,, and,, tupes, occupying.mb and.gb on the disk, respectivey. We use ObjectRank (goba) [] and VaueRank [] to cacuate the goba importance for the tupes of the DBLP and TPC- H databases respectivey. For a more thorough evauation, we investigate scores by various settings that have been studied in [], namey, two G A s: () the G A s (defaut) are presented in Figure whereas () the G A for the DBLP has common transfer rates (0.) for a edges and for the TPC-H negects vaues (i.e. becomes an ObjectRank G A ) and three vaues of d: d =0. (defaut), d =0. and d =0.. We use Equation to cacuate affinity (aternativey ey/db/ an expert can define G DS s and affinity manuay, i.e. to seect which reations to incude in each G DS and their affinity). For the experiments, we used Java, MySQL, cod cache and a PC with an AMD Phenom 0.GHz (Quad-Core) processor and GB of memory.. Effectiveness We used human evauators to measure effectiveness. First, we famiiarized them with the concepts of OSs in genera and size OSs in particuar. Specificay, we expained that a good size- OS shoud be a stand-aone and meaningfu synopsis of the most important information about the particuar DS. Then, we provided them with OSs and asked them to size- them for =,,,,,. None of our evauators were invoved in this paper. Figure measures the effectiveness of our approach as the average percentage of the tupes that exist both in the evauators size- OSs and the computed size- OS by our methods. This measure corresponds to reca and precision at the same time, as both the OSs compared have a common size. DBLP. Since the DBLP database incudes data about rea peope and their papers, we asked the DSs themseves (i.e. eeven authors isted in DBLP) to suggest their own Author and Paper size- OSs. The rationae of this evauation is that the DSs themseves have best knowedge of their work and can therefore provide accurate summaries. Figures (a) and (b) pot the reca of the optima size- OS for various ObjectRank settings. In genera, ObjectRank scores produced with G A -d and G A -d are good options for Author and Paper size- OSs generation (as these settings produce simiar ObjectRank scores) and aways dominate on arger vaues of. More precisey for G A -d, effectiveness ranges from % to 0% for = to, and from % to % for =. These resuts are very encouraging. User evauation aso reveaed that the inter-reationa ranking properties (e.g. whether paper p is more important than author a ) affect cruciay the quaity of the size- OSs. For instance, on author OSs, evauators first seected important Paper tupes to incude in the size- OS and then additiona tupes such as co-authors, year, conferences (these were usuay incuded in summaries of arger sizes, i.e. ). The bias to seect Papers (i.e., st - eve neighbors) is favored by setting G A -d, athough in overa this setting was not very effective; e.g., in Figure (a), this setting achieves.% (in comparison to % of G A -d ) for =. The impact of approximated size- OSs produced by our greedy agorithms on effectiveness is very minor. For instance using s- cores produced by the defaut setting (i.e. G A and d =0.) on the Author G DS, the Update Top-Path- agorithm generates summaries of the same effectiveness as the optima, whereas Bottom-Up has very minor additiona oss ranging from % to %. On the Paper G DS, a approaches give the same effectiveness as they a return the optima size- OSs. The use of preim- OSs had no impact on effectiveness. As we show ater, preim- OSs have very minor impact on approximation quaity which did not affect effectiveness. TPC-H. We presented random OSs to eight evauators and asked them to size- them. The evauators were professors and researchers from Manchester and Hong Kong Universities. In addition, for each OS and tupe, a set of descriptive detais and statistics was aso provided. For instance for a customer, the tota number, size and vaue of orders and the corresponding minimum, median and maximum vaues of a customers were provided (e.g. simiary to the evauation in []). The provision of such detais gave a better knowedge of the database to the evauators. In summary, the G A (for any d) is a safe option as it produces good size- OSs on both Customer and Suppier OSs (Figures (c) and (d)); e.g. effectiveness resuts for G A -d range from % to %. On the other hand G A, which is the ObjectRank version of the G A, did not satisfy as much the evauators on Suppier OSs.

Versatile Size-l Object Summaries for Relational Keyword Search

Versatile Size-l Object Summaries for Relational Keyword Search IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, XXXX, YYYY Versatie Size- Object Summaries for Reationa Keyword Search Georgios J. Fakas, Zhi Cai and Nikos Mamouis Abstract The Object