Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach

Size: px
Start display at page:

Download "Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach"

Transcription

1 UIUC Technical Report UIUCDCS-R , UILU-ENG March 03 (Revised March 04) Optimizing Access Cost for Top-k Queries over Web Sources A Unified Cost-based Approach Seung-won Hwang and Kevin Chen-Chuan Chang Computer Science Department University of Illinois at Urbana-Champaign ABSTRACT This paper studies minimizing access costs by cost-based optimization for top- queries in middlewares, and in particular over Web sources. By dynamic search over a space of algorithms, cost-based optimization is general across a wide range of access capabilities, yet adaptive to the specific access costs at runtime. Such optimization is crucial, especially for querying Web sources, to handle their heterogeneous capabilities and dynamic costs However, techniques for systematic optimizations are clearly missing for top- queries To begin with, what is the algorithm space to optimize over? Our approach hinges on developing an abstract framework to induce this space Analyzing the logical structure of top- queries, we build Framework by focusing on necessary scoring tasks, which thus achieves both generality and specificity as an algorithm space. Further, how do we identify an effective algorithm in the space? We develop dynamic search schemes, adopting two scheduling heuristics for reducing the search space. Our experiments indicate that this cost-based approach indeed outperforms existing algorithms specifically designed for their scenarios. 1. INTRODUCTION As the Web has rapidly evolved into an ultimate repository of extensive and up-to-date information, querying over Web sources is essential for searching and integrating online information. Such querying, with the overwhelming scale of data, naturally demands ranked answers, or best first, to enable users to focus on a few top results. In particular, such ranking is prevalent across many Web search engines and searchable databases. We study the problem of supporting ranked queries over Web sources. To motivate, consider a Web travel agent scenario for finding restaurants and hotels, as Examples 1 and 2 illustrate. (We use this real scenario as benchmark queries for experiments; Section 9.) In particular, how to access sources with different capabilities and costs, to answer queries efficiently? As our Web middleware coordinates various sources, each source access will incur network communication and server computation. This paper aims at optimizing such access costs which dominate the overall query processing (like I/O in relational DBMS). Example 1 To find top- restaurants (say, in the Chicago area) that are highly-rated and close to her place myaddr, a user may ask a ranked query (in SQL-like syntax) select name from r order by! (r), "$#&%')( (r, myaddr)* stop after 5 (Query ) For query answering, our middleware will access some Web sources to evaluate the predicates, e.g., +! and "$#&%')( into scores in [01], which are then aggregated by some scoring function,, e.g.,,.-/01, to determine the highest-scored 5 restaurants. Our middleware can use various sources in query answering Figure 1(a) shows one possible scenario For evaluating "$#&%')( superpages.com is capable of 1) returning the "$#&%')( score for a specific restaurant ( random access ) and 2) returning restaurants in their descending order of scores ( sorted access ). For +! dineme.com similarly provides both sorted and random accesses. The middleware will coordinate these accesses to find the top results. Such accesses are typically expensive (as compared to local computations) with varying costs To characterize, Figure 1(a) shows the average access latency (thus including both network and server times) of both sorted and random access (denoted 243 and 2)5 respectively) for each predicate In this scenario, random accesses are more expensive in both sources (i.e., ), but with different actual scales (i.e., 265 ) and ratios (i.e., 9; ). 9< Example 2 Consider query > for the top- hotels that are close, with high star-rating, yet within the budget select name from?a@ B h order by CDFEG"$#&%'$( (h, myaddr),! (h), "IHJ(+6K (h)* stop after 5 (Query > ) Figure 1(b) describes another scenario, with hotels.com providing sorted access to all the predicates. In this setting, since a sorted access (e.g., for "$#&%')( ) also retrieves all the attributes of a hotel (e.g., stars and price ), the subsequent random accesses 1 to the same hotel are essentially of zero access costs (2)5 0ms) e.g., using stars and price, the middleware can locally compute! and "IHJ(+6K. This scenario thus significantly contrasts with expensive random accesses of Example 1. Our goal is to develop middleware algorithms, or query plans for coordinating sources, to minimize access costs. This task is 1 In a middleware, random accesses to an object L can only occur after L is first seen from sorted accesses or, no wild guess [9]. 1

2 We discuss related work in Section 2, and start with preliminaries in Section 3. Section 4-6 defines Framework as a space for top- algorithms. Section 7 then develops optimization schemes over this space. Section 8 discusses how our framework unifies and contrasts with existing algorithms. Section 9 reports our experiments. Figure 1 Web query scenarios for (a) and (b). challenging First, sources are heterogeneous, with widely varying access capabilities and costs (e.g., as the real sources in Figure 1 shows) Our algorithms must be general for various capability configurations. Second, the Web is dynamic, with cost scenarios changing over time (e.g., depending on source load and availability). Our algorithm must be adaptive to runtime factors. While many middleware algorithms exist, they do not satisfy these Web querying requirements, as Section 2 will review For generality, the existing algorithms have mostly been designed with specific cost scenarios in mind. (In fact, even together, they do not cover some scenarios, e.g., Example 2.) For adaptivity, they largely lack systematic runtime optimization, with at most only limited heuristics. We take a cost-based optimization approach By dynamic search over some space of algorithms, cost-based optimization is general across virtually all cost scenarios, yet adaptive to the specific one at runtime. While such optimization has been taken for granted for relational queries from early on [15], it is clearly lacking for ranked queries. However, such optimization is challenging To begin with, is there a complete yet focused algorithm space to search over? Our approach hinges on developing an abstract framework to induce this space Inspired by relational algebraic framework with logical operators, we analyze the logical structure of top- queries, and construct Framework By focusing on necessary scoring tasks, it achieves both generality and specificity as an algorithm space. Second, with defined, we need to develop systematic optimization schemes to effectively identify, in principle, the optimal algorithm in. Such search must balance both the overhead and the quality of optimization. While we study Web querying, our approach is applicable in any middleware environments (e.g., multimedia systems [16]), where access costs are significant. Our experiments thus evaluate both real-life Web querying (using our travel agent benchmark scenarios) and a wider range of synthesized middleware settings. The results are indeed encouraging Our framework outperforms the existing algorithms specifically designed for their scenarios. Overall, this paper develops cost-based optimization for top- querying (over Web sources) To our knowledge, our framework is the first such optimization. In realizing this goal, our contributions are as follows We define Framework as a complete yet focused algorithm space for top- queries Identifying such a space is essential for systematic optimization. We develop dynamic optimization schemes for searching over the space to find an effective algorithm. We report experimental evaluation using both real-life and synthetic scenarios. Our study indicates the generality and adaptivity of a cost-based approach. 2. RELATED WORK Supporting top- queries over Web sources has also been studied by [2, 5], in more limited scenarios where sources support only random accesses (or probes ). In contrast, our work schedules arbitrary accesses (random, sorted, and potentially beyond), which complicate optimization with the progressiveness and side-effect of sorted accesses (Section 3.2), and thus enable general applicability to any top- scenarios. In fact, our main results (e.g., Theorems 1 and 2) make no assumptions on the access types. In the broader context of middlewares, many algorithms have been proposed for different cost scenarios Figure 2 summarizes a matrix of access scenarios that have been studied, each characterized by how sources relatively support either type of access, e.g., cheap, expensive, or impossible. Fagin pioneered Algorithm FA [8, 16] for scenarios where random and sorted accesses are supported with uniform cost (the diagonal cells in Figure 2). [14, 9] then proposed (or equivalents) with a stronger sense of optimality. Meanwhile, some works [9, 11, 1] explored non-uniform scenarios e.g., CA (when random access is expensive), NRA (when random access is impossible), and, MPro and Upper (when sorted access is impossible). Further, SR-Combine[1], Quick- Combine[10], and Stream-Combine [11] enhance the above base algorithms with some runtime optimization. However, their heuristics has limited applicability e.g., it uses the partial derivative of scoring functions as an indicator, which may not applicable to all functions (e.g., min). In contrast to existing algorithms, our goal is to develop systematic cost-based optimization 1) Our approach is rather general it not only unifies existing algorithms in Figure 2 but also extends to a larger space, for any scenarios that our cost function (Section 3.2) can model. In particular, the scenario when random access is cheaper, as in Example 2, has not been studied (marked with? in the matrix). 2) By dynamic optimization, our approach naturally adapts to a given query at runtime such adaptation is largely lacking in existing algorithms. Meanwhile, ranked queries have also been proposed for relational databases Carey et al. [3, 4] presented optimization techniques for exploiting the limited cardinalities of ranked queries. References [7, 6] then proposed to exploit probabilistic distributions and histograms respectively, to process rank queries as equivalent Boolean selections. 3. SEMANTICS AND MODELS To establish the context of our discussion, this section describes the semantics and a cost model for top- queries. 3.1 Query Semantics A top- query (,, ), with scoring function, and retrieval size, selects top objects ranked by,, from database,,. Each object has a predicate score for every and an overall query score,0 *, * 2. Without loss of generality, we assume that all scores are in. 2 To be more rigorous,,0 * is in fact,! G*, where - *, i.e., a composition of, and predicates. 2

3 > Random Access Sorted Access cheap cr i 1 expensive cr i h impossible cr i cheap cs i 1 FA,, Quick-Combine CA, SR-Combine NRA, Stream-Combine expensive cs i h? FA,, Quick-Combine NRA, Stream-Combine impossible cs i Z, MPro, Upper Z, MPro, Upper X Figure 2 Access scenarios and their proposed algorithms. As a standard assumption,, is monotonic, i.e.,, *, * when and ) and their scores. As output, a top- query returns a sorted list of top objects (i.e., ), along with and ranked by their overall, scores, such that, 7, D, and D. Note that, to give deterministic semantics, we assume that there are no ties otherwise, a deterministic tie-breaker function can be used to determine an order, e.g., by unique object IDs (e.g., hotel names) 3. As our running example, we will consider (Example 1) for finding top- restaurant, i.e., -. (For notational brevity, we will write predicates +! and "$#&%'$( as and respectively.) For our illustration, let s assume Dataset 1 (Figure 3) as our example restaurant objects (i.e., (which can only be known by accessing the Web sources). For instance, object scores -, -, and, - * -. Overall, as a top- query, will return an answer 0.7, i.e., is the top-ranked object with score, Cost Model for Middleware Accesses For ranked querying over Web sources, a middleware algorithm will gather predicate scores by some supported accesses to sources As Section 1 introduces, a source may support 1) sorted access on predicate, denoted 3$C ; or 2) random access on predicate for object, denoted 5C *. To illustrate, consider our example over Dataset 1. Figure 3(b) illustrates the sorted accesses For instance, dineme.com supports 34CG (sorted access on ) (note! ). Each 3$C will return one next-ranked object in the order of i.e.,.7,.65, and.6. Alternatively, random access will directly return an object s score on some predicate For instance, superpages.com supports 5CJ ;* by returning the score (note "$#&%')( ) for, e.g., -. A middleware algorithm is thus a query plan that uses (and schedules) such accesses for query answering. Different algorithms will perform different set of accesses to gather the scores needed, as we illustrate below Example 3 (Performed Accesses) To illustrate, consider an algorithm! performing the following accesses " #! 6* 3$CG, 5C 6*, 3$C, 5C 4*, 34C, 5C *. Note we use "! 6* to denote the performed accesses by!. With these accesses,! has gathered enough information to answer In particular, it simply gathers the exact scores of every object for every predicate The top- can then be identified by sorting objects by their, scores. Note, the same query can be answered by different algorithms with different sets of accesses, e.g., "! *>- 3$CG 34C 34C 34C 34C 63$C 4. As a remark, we note that the two types of accesses differ fundamentally in two aspects side-effects Sorted access 34C has side-effects; To illustrate, in Figure 3(b), the first 34C not only evaluates.7 but also bounds the maximal-possible score of for every unseen 3 Such enforcement of certain tie breaker enables optimization to compare only truly comparable algorithms returning the same results. $&%' (*) (,+ -. ) /,0 132 /,0 4 / /50 1 /,0 6 /,0 1.*7 /50 8 /,0 8 /,0 8 (a) dataset (b) sorted accesses on and G Figure 3 Dataset 9 objects with this last-seen score e.g.,.7. In contrast, random access 5C+ * has no effect on other objects than itself. progressiveness Sorted access 34C is progressive in that repeated accesses give more information For instance, repeated 3$C evaluates,, and in turn, by as accessing deeper into s sorted list. In contrast, 5C * will return the same every time and thus it should not be repeated. Over Web sources, each access incurs some cost, e.g., network communication or server computation. As such costs often dominate, our goal is to minimize the total access cost, which represents the total resource usage. To capture various scenarios, our cost model uses 243 and 2)5 to specify the unit cost of a sorted and random access respectively for predicate. The total access cost will then aggregate the costs of all accesses; i.e., let < and be the number of sorted and random accesses respectively, for performed by some algorithm!, the total cost is *-@? #! < 4243 BA $265 (1) Example 4 (Cost Model) To illustrate how our cost model works, continue Example 3. In an access scenario illustrated in Figure 1(a), where 243 -DCE 3 +2)5F DCGF,F 3 and 2)5 - F 3, Algorithm! performing 3 3$C and 3 5C (i.e., < -HC and -HC ) incurs the total cost of Meanwhile,! performing 3 3$C and 3 3$C incurs a smaller cost of However, observe that optimization is specific to the given cost scenario at run time In another scenario like Figure 1(b), where IF,F 3 and 265F - 2)50-3, Algorithm! is more efficient than!. Note the total access cost, as a standard cost model used in top- works [9], reflects not only total resource usage, but also elapsed time as well, when accesses are performed sequentially. Thus, in general, our access minimization framework will naturally optimize for both. However, the two optimization goals can conflict, when sources can handle concurrent accesses (as Web sources typically do) While elapsed time benefits from high concurrency, unrestrained concurrent accesses will certainly abuse resources (e.g., causing the server to congest). To address the conflicting goals, we model concurrency as bounded and optimize within this concurrency limit We will show that such parallelization can simply build upon our accesses minimization framework (Section 9.1.1). 4. MOTIVATION ALGORITHM FRAME- WORK ; 3

4 > To enable optimization or search for an effective algorithm we must first define a space of algorithms to search over. Put simply, the goal of cost-based optimization is, in principle, to find the optimal > algorithm! in that space, with respect to the cost model, i.e.,! -,! * (2) While crucial, such a space has not been developed for top- queries. Defining this space is challenging The space must be both large, or general, to encompass all comparable algorithms while still sufficiently small, or specific, to allow efficient search. For relational queries, this space is induced by an algebraic framework As a query is composed of relational operators (e.g., joins and selections), the space of algorithms consists of those query plans that are equivalent algebraically. Each query plan is thus simply a schedule of the operators (by their commutativity and associativity). The algebraic framework induces a space of query plans, each as a different schedule. Optimization is to find a good schedule of operations, conforming to the framework. Our approach builds on this insight of an algorithm framework, or an abstract algorithmic structure, to induce the space of query plans, or algorithms. As the basis, we focus on sequential frameworks that iteratively schedule accesses For our objective of minimizing total access costs (Eq. 1), sequential query plans are sufficient, since parallel accesses do not reduce total costs. However, we stress that parallelism can be built upon an effective sequential plan Since parallel accesses are possible over Web sources, Section will discuss parallelization. To concretely motivate this notion of framework, we start in this section with a simple one, Framework TG, which captures all sequential algorithms. With TG, we will contrast the requirements of generality and specificity. To begin with, in abstract, all sequential algorithms (or query plans) simply iterate accesses one by one As Figure 4 shows, in this Framework TG, any sequential algorithm! will continue (in the while-loop) to select and perform an access until the top- can be determined. In each iteration, let " (the accesses-so-far ) be the accesses that! has performed so far (initially empty).! will stop when " has gathered sufficient information for the query; otherwise,! will keep selecting some access from (i.e., the pool of all supported accesses) to proceed. Note that, as an abstract framework, TG generates a space of concrete algorithms. This space, denoted TG*, consists of all sequential algorithms Any concrete algorithms, while sharing this framework, will differ in their access schedules, by different Select strategies (line 6). As TG is rather unrestrictive it allows any accesses (in ) as alternatives to select from any algorithm as a sequence of supported accesses can fit into TG. Example 5 (TG) To see how TG generates query plans, consider! and! in Example 3 Suppose! executes accesses in "! * by the order as listed 3$C 5C * 3$C 5C *. TG can generate! by, at each Select, choosing 34C and 5C alternatively. Similarly, it generates! by alternating 3$C and 3$C. Is Framework TG general enough for optimization? That is, if we focus on only those in TG*, will we miss the best algorithm overall (i.e., without the restriction of the framework)? More formally, a framework is general, with respect to a cost function > (e.g., Eq. 1), if it can generate the optimal algorithm under >. As just explained, we consider sequential algorithms for our optimization. Thus, TG is trivially general By simply encompassing all sequential algorithms, it will not miss the optimal one. Framework TG(Q, D) Trivially General Input query Q (F(p 1,..., p m ), k), database D {u 1,..., u n } Output K, top-k objects from D w.r.t. to F 1) S {sa i, ra i (u j ) p i, u j }; //all supported accesses. 2) P φ; //accesses-so-far 3) while (P has not gathered 4) sufficient scoring information for determining K) 5) alternatives S; 6) Select access A from alternatives;//access selection. 7) perform A; update K; P P {A}; 8) return K; Figure 4 Framework TG for top- query processing. Such generality allows us to focus on the framework in optimization, by simply searching over concrete query plans within TG*. For TG, this search amounts to finding a good access scheduling strategy of Select Different algorithms will have different schedules and thus different costs e.g., while both are in TG* (Example 5),! and! cost differently (Example 4). Further, to enable more focused search, a framework must also be specific. Unfortunately, though general, TG is extremely nonspecific It simply allows any supported access to be selected from at each iteration, i.e., alternatives, which is often a very large set of choices. For instance, for with - objects and with - predicates, alternatives - A - F. As different choices generate different algorithms, such non-specificity renders an extremely large algorithm space. It is thus difficult to find an effective algorithm within TG. In summary, as a motivating framework, TG is trivially general but extremely non-specific; it is thus not useful for optimization. Our goal is to develop, by refining TG, a framework that is both general and specific. To achieve specificity, we must make alternatives at each iteration as small as possible While specializing these choices, can we still maintain the generality of the framework? To construct an effective framework, it is critical to first analyze the logical structure of top- queries, so as to understand the building blocks. Analogously, relational queries are composed of relational operators as the task units for query plans to schedule. However, it is not obvious how a top- query, as an arbitrary scoring function, e.g.,, *, can be decomposed into logical tasks. Section 5 will thus develop task decomposition, as the basis for building Framework in Section THE BASIS SK DECOMPOSITION While accesses are physical means for gathering object scores, what are logical tasks that a top- query must fulfill? This section studies task decomposition of a top- query as a set of necessary tasks, to be the building blocks for constructing an effective framework (Section 6). 5.1 Defining Scoring Tasks We take an information-theoretic view and ask What is the required information for answering a top- query? Given a database, any algorithm! must gather certain score information for each object, to determine the top-. We can thus compose the work of! by a set of required scoring tasks,. To define such tasks, let D D be the top- answers (where each D represents some from ). A task is to gather the (exact or partial) scores of object, by using relevant accesses, in order to either (if ) compute s overall score or (else) prove that it cannot score higher than D (the 4

5 , answer). Definition 1 (Scoring Tasks) Consider a top- query (,, ), with top- answers DA +DF. The scoring task for object is 1. for must compute the exact, score; or 2. otherwise must indicate (by some partial scores) the maximalpossible, score, tight enough to support that,, D. (Note we remove potential equality by deterministic tie breaking.) As a remark, note that these tasks are specified with given (the top- answers) and, D (the score). These values, unfortunately, will remain undetermined before query processing is fully completed For this task view to be useful, our challenge (as we will discuss) is thus to develop mechanisms for identifying unsatisfied tasks during query processing, before and, D are known. Example 6 (Scoring Tasks) Consider our running example over,, (Figure 3) For -, the answer is with,.7 (these values are not known until is processed). We can specify the scoring tasks,, for the three objects as follows. Consider task Since, must gather all predicate and G for computing,. Note can do scores so in various ways, e.g., by one sorted access 3$C into (which hits and returns.7) and a random access 5CJ * (returning.7). To contrast, task for (and similarly for ) needs only to prove, by gathering some partial scores, that,,.7. To do so, can use, say, two sorted accesses 3$CG into, which return first.7 and then.65 Now, since is still unseen from the sorted list of, it is bounded by the last-seen score, i.e.,. As, ; G *,, cannot be higher than, i.e.,,. We stress that these scoring tasks are both necessary and atomic First, each is necessary If any is not satisfied,! cannot properly handle object 1) if is a top- answer,! cannot return its final score; 2) otherwise, without proving,, D,! cannot safely exclude from the top-. Second, each, as a per-object task, is atomic For arbitrary,, cannot generally be decomposed into smaller required subtasks. For case (1) of Definition 1, when, obviously all predicate scores are required. For case (2), no subsets of s predicate scores are absolutely required, as long as the upper-bound inequity can be proved. In summary, we now view query processing as equivalent to fulfilling a set of (necessary and atomic) tasks Each task, for object, gathers the required per-object information. Only when (and clearly when) all the tasks are fulfilled, the query can be answered. 5.2 Identifying Unsatisfied Tasks To focus query processing, it is critical to identify unsatisfied tasks to concentrate on. However, during query processing, it is challenging to judge whether a task is satisfied, since DA D F, which our task specification (Definition 1) requires, is not determined until the very end. In fact, for our purpose, we can address a slightly different problem Given a set of accesses-so-far " that has been performed, can we find any unsatisfied task? Instead of identifying all, for query processing to move on, it is sufficient to find just one. (Note any unsatisfied task must eventually be fulfilled.) Our insight is, by OID p 1 p 2 F u u u Figure 5 The score state of Example 7. comparing the score state of objects, we can always reason some tasks to be clearly unsatisfied, despite the eventual result. Example 7 (Unsatisfied Tasks) Consider over Suppose, at some point, we have performed " 3$CG, 34C, 3$C, 5CG$ 4*. Referring to Figure 3, these accesses will gather the following score information The two sorted accesses 3$C on will hit.7 and.65. As side-effect (Section 3), the unseen objects ; (i.e., ) will be bounded by the last-seen score, i.e.,. The one sorted access 34C on will return - ;, and set upper bounds G and G. The random access 5CG4 4* returns -. Putting together, Figure 5 summarizes the current score state. For The above accesses gathered - and, and thus, * -. Similarly,, - ; and, At this point, while we do not know what will be (as Definition 1 requires), we can identify at least the scoring task for as unsatisfied, no matter what is if (i.e., will eventually be the top- ) needs to gather exact G to compute the, score. if in this case, the top- is or, with, scores of at most.65 and.6 respectively (Figure 5) Thus, the top- score (i.e.,, D in Definition 1) is at most.65. Clearly, has not proved that,, since can score as high as.7. As Example 7 hints, task is unsatisfied, if has potential to be in the top- results. For such (e.g., ), regardless of what will be, we must know more about its scores to declare it as either top- or not. We thus identify whether is unsatisfied as follows We quantify the current potential of (with respect to " ), and determine if this potential is high enough to make the top- results. To begin with, we measure current potential of an object by its maximal-possible score. Define, as the maximal score that may possibly achieve, given the partial scores that accessesso-far " has gathered. As, is monotonic, we compute, by substituting unevaluated predicates with their maximal-possible scores Note that is bounded by the last-seen score from its sorted accesses, denoted. (Section 3.2 discussed such sideeffects of sorted accesses.) For instance, as Figure 5 shows,, * - - *.65. Thus, formally,, * - if " has determined (3) - otherwise. Further, we focus on the current top- objects by their potentials. Let D,, D be these current top objects ranked by their, scores. (To illustrate, in Example 7,.) There are two situations, depending on if any current top objects are incomplete First, if contains any incomplete object one that has not been fully evaluated (i.e., with only partial scores) As Example 7 argued for (an incomplete top- ), such D needs further accesses either way, by Definition 1 1) If D is indeed the final top-, it 5

6 needs complete evaluation. 2) Else, it needs further accesses to lower its maximal-possible score, to be safely excluded from top-. Thus, task for such incomplete D is clearly unsatisfied. Second, if all objects D,, D in are complete These current top- with respect to " are now indeed the final top- (i.e., ) (and the query can halt with these answers). To see why, we make two observations 1) Every D is complete and thus has its exact score, i.e.,, D, D. 2) Every object, with the current ranking, has its maximal-possible score lower than the above exact scores, i.e.,,, D. It follows that those D are the top- answers, fully evaluated. Meanwhile, with these two observations, Definition 1 will declare all scoring tasks (either case) as satisfied. That is, checking from the task perspective, it is consistent to see that all tasks are fulfilled thus query processing can indeed halt. Theorem 1 states our results on identifying unsatisfied tasks. Theorem 1 (Unsatisfied Scoring Tasks) Consider a top- query (,, ) over -. With respect to a set " of performed accesses, let D,, D be the current top- objects ranked by,. 1. D s.t. D has not been completely evaluated, its scoring task is unsatisfied. 2. If all D s are complete, then every scoring task,, is satisfied, and is the top- results. Proof (1) If D has not been completely evaluated, its scoring task is unsatisfied No matter what will eventually be, there are two possible situations If D As its scoring task must compute, D, the task is not complete until we gather D for every unevaluated predicate of D since D has not been completely evaluated, such must exist and thus is still unsatisfied (by Definition 1, Case 1). If D Suppose its scoring task is satisfied It will indicate that there are at least objects (e.g., those in ) satisfying, D,, which in turn satisfy, D, ", as,, ". Meanwhile, as D, there are at most objects, " D, ", a contradiction. (2) If all D s are complete,, D -, D 7,,,, and thus -. With this, we can show that scoring task is satisfied, for every. As every has been completely evaluated, is satisfied (by Definition 1, Case 1). As, D 7, D (as shown above), is thus satisfied (by Definition 1, Case 2). We stress that Theorem 1 is generically useful First, it is useful, by guaranteeing to identify some unsatisfied tasks, if there exist any Condition 2 gives a precise way to determine if there still exist any unsatisfied tasks. If so, Condition 1 will identify at least some of them (i.e., those incomplete D ). Second, it is rather generic its treatment of logical tasks makes no assumptions on particular physical accesses. We can thus uniformly handle both random and sorted accesses (and beyond), despite the progressiveness and side-effects (Section 3). (As Section 2 discussed, some earlier works [5, 2] assume random access-only scenarios.) These results provide a basis for constructing a specific framework (Section 6), by focusing on unsatisfied tasks, without compromising generality. 6. FRAMEWORK This section develops a framework that is both general and focused, by refining TG (Section 4). Built upon our task decomposition (Section 5), Framework concentrates on, at each iteration, Framework (Q, D) Necessary Choices Input query Q (F(p 1,..., p m ), k), database D {u 1,..., u n } Output K, top-k objects from D w.r.t. to F 1) P φ; //accesses-so-far 2) K P {v 1,..., v k top-k from D ranked by F P [ ]}; 3) while (U {v j v j K P ; v j is incomplete}) 4) v j any object in U; //e.g., the highest-ranked 5) N j {sa i, ra i (v j ) p i [v j ] is undetermined by P}; alternatives N j ; 6) Select access A from alternatives;//access selection. 7) perform A; update K P ; P P {A}; 8) return K K P ; Figure 6 Framework. a small set of necessary choices, as induced by an unsatisfied task. Section 6.1 will first present the framework, before 6.2 discusses its generality. 6.1 The Framework This framework hinges on the insight that query processing can focus on only unsatisfied tasks while still general enough to preserve potential optimality. Our motivating framework, TG, is rather unfocused since iterative accesses can be selected from the entire pool of supported ones. In contrast, Framework will first identify some unsatisfied task and then focus selection on those accesses for fulfilling. This insight is built on task decomposition (Section 5) that top- query processing is equivalent to fulfilling a set of (necessary and atomic) tasks. With this task view, during processing, when a set of accesses " has been performed, we can identify unsatisfied tasks, by Theorem 1. (When all tasks are satisfied, query processing can halt, as Theorem 1 also asserts.) For any unsatisfied, we can construct a set of accesses, specifically for satisfying, by collecting all and only accesses that can further process These accesses constitute the necessary choices for fulfilling. More precisely, will consist of any (random or sorted) accesses that can return (exact or bounding) scores about s unevaluated predicates. (As Theorem 1 states, for such unsatisfied, its object must be still incomplete.) Example 8 (Necessary Choices) Continue our running example. Example 7 identified that task is unsatisfied, for object, with a score state (.7,.9,.7), as Figure 5 shows. Note that is unsatisfied, since the accesses-so-far " has not gathered sufficient information for (for either case of Definition 1). To satisfy, we must know more of s scores in particular, for predicate, whose exact score is unknown. Thus, the following accesses can contribute 5C Sorted accesses on Performing 3$C can lower the upper bound of G As " (Example 7) has already one 3$CJ, the next 3$C will return with score.8 (Figure 3). This new lastseen score by 3$C will give a tighter bound for (from to ). Random access on Performing 5CJ * will return the exact score of for G, thus turning into completely evaluated,,.7). In fact, is now with score state (.7, G.7 satisfied. Thus, for satisfying, the set of possible choices is 3$C, *. 6

7 D step p 1 p 2 K P alternatives Select {u 3 } N 3 {sa 1, sa 2, ra 1 (u 3 ), ra 2 (u 3 )} sa {u 3 } N 3 {sa 2, ra 2 (u 3 )} ra 2 (u 3 ) Figure 7 Illustration of. Definition 2 (Necessary Choices) Given a set of performed accesses ", let be an unsatisfied scoring task, for object. The necessary choices for with respect to " is 34C, 5C is undetermined by ">. As Figure 6 shows, Framework builds upon TG, with additional steps for identifying necessary choices. Theorem 1 guides this process At any point, maintains, the current top- objects with respect to accesses-so-far ", ranked by maximal-possible scores,. Some objects in may still be incomplete, which variable collects. As Theorem 1 specifies, there are two situations 1. If - As all top- objects are complete, Theorem 1 asserts no more unsatisfied tasks, which is thus the termination condition of will break the while-loop (since - ), and return. 2. Otherwise Since -, there are incomplete top- objects. Any such object D corresponds to an unsatisfied task, by Theorem 1. arbitrarily picks any such D (say, the highestranked one), and constructs the necessary choices (by Definition 2) as alternatives for selecting further access Note that essentially relies on Theorem 1 to isolate a set of necessary choices. Theorem 1 enables an effective way to search for necessary choices, by maintaining, the current top- objects. Thus, a search mechanism for finding unsatisfied tasks should return top- objects when requested e.g., a priority queue that orders objects by maximal-possible scores as priorities. Note that, initially, all objects have the same maximal-possible score (i.e., a perfect 1.0). This initial condition is simply a special case of ties In principle, will initialize (in Step 2) with some deterministic tie-breaking order (Section 3). In practice, any tiebreaker (e.g., run-time order that does not require resorting) can be used our optimization will hold for algorithms returning the same results. However, for the sake of presentation, our examples will assume some OID as a tie-breaker, e.g., when and * tie and 7, then effectively, 7,. Observe that, at each iteration, there may be multiple incomplete in. We stress that can simply choose any such D to proceed. Each D designates an unsatisfied task Any such must be further carried out, and is thus equally necessary (Section 5.1). More precisely, an unsatisfied task will induce a set of necessary choices with a desired completeness property As Section 6.2 will discuss, with this completeness, any can guarantee the generality of. Example 9 illustrates how works. Example 9 (Framework ) Figure 7 shows the execution of an example algorithm! (for query of Figure 3) that can generate Initially, at Step 1 (Figure 7), as all the maximal-possible scores tie as 1.0, is set to (by the highest OID, our tiebreaker), which induces alternatives. According to,! then Select an access, 3$CG in this case, among the alternatives, which returns.7 (see Figure 3) and lowers to.7. At Step 2, as all the maximal-possible scores tie as.7, remains as the top in. However, now induces a smaller, with accesses only for its unevaluated predicate.! then Select 5C *, which returns.7 and completes with *,.7. Since with as the top- is now fully complete, according to,! will halt, with total accesses "! *- 3$C 5C *. 6.2 Generality and Specificity Our objectives toward an effective framework, as Section 4 motivated, are both generality and specificity. We next show that, unlike our motivating framework TG, is not only far more specific but also sufficiently general. First, we note that, by focusing on only necessary choices, is clearly more specific than TG (in which access selection must consider any arbitrary accesses). For instance, Section 4 motivated the non-specificity of TG with an example of alternatives- A - For the same setup, will have a far smaller choice set, according to Definition * 2 alternatives- E ) - (i.e., one 3$C and one 5C * for each ). Further, we stress that, although more specific, is still general enough for optimization. This generality results from the completeness property of necessary choices, which uses as alternatives. In particular, we define a set of alternatives as complete with respect to accesses-so-far ", if any algorithm! performed " that has must also perform at least one access from alternatives. Thus, alternatives - (as in TG) is trivially complete If " is not sufficient to determine query answers, any algorithm having done " must continue with at least one more access which by definition must be in, all supported accesses. In fact, while focuses on a much smaller alternatives, it is still complete. To see why, note that identifies a set of necessary choices which, by Definition 2, contains all accesses that can contribute to the unsatisfied task. Since is necessary (Section 5.1), at least one access in must be further executed, or cannot be satisfied and thus the query cannot be answered (For instance, for in Example 8, if neither 3$CJ nor 5CJ * is executed after ", will remain unsatisfied.) Thus, is complete, with respect to accesses-so-far ". This completeness holds for the necessary choices of any unsatisfied task since any such must be fulfilled, sooner or later. This completeness property ensures that is sufficiently general for optimization. That is, in our optimization (Section 7), we only need to consider the space of algorithms, denoted *, generated by Framework. For this purpose, we deem a space as sufficiently general, if it contains a comparable counterpart algorithm for every possible algorithm. That is, any arbitrary algorithm will find some counterpart in * with no more cost, as Theorem 2 below states. With this guarantee, it is sufficient to search only within for an optimal algorithm. Theorem 2 ( Generality) For any algorithm! with an access cost > with respect to the cost model (Eq. 1), there exists an algorithm! in * with cost, such that. Proof Consider any query processing by! (for some query over database ). We will show the generality of by constructing an algorithm! in Framework for the same processing, such that! costs no more than!. Let " be the total accesses that! has performed, i.e., "! 6* ". Since! follows the interative framework (Figure 7), let " be the accesses of! before the iteration; initially, " -. Similarly, let alternatives be alternatives of! at iteration. Our proof is based on the following two lemmas and for every iteration, which we show later. " ". alternatives " " * -. 7

8 ! > > Note that, by, algorithm! incurs no more access than!, when! halts at some iteration (denoted > as! ) " ". > Note, this immediately implies that!! * as well, * because our cost function (Eq. 1) is monotonic to accesses performed If! performs more times of every kind of access than, then! will have an overall higher cost, i.e., > "! * "! I* -! *! 6* To complete the proof, we now show by induction that and hold; we will also specify the behavior of! for each iteration, to show how it can be constructed in the framework. - is trivial, since initially " -. Consider We note that, by definition of the Framework, alternatives is complete that any algorithm (like! ) that has performed " must have performed in addition some access among alternatives. Thus, as! has performed " (trivially, since " - ), it must have performed access alternatives in addition. That is, is in both alternatives and " ", and thus holds. - As the induction hypothesis, assume for -, the lemmas hold. What should algorithm! do in each iteration? We now construct! for iteration If! exhausts ", which provides enough information to answer,! halts right before this iteration. Otherwise, requires that! select one access from alternatives to continue We will let! choose an access that is also in " " Such must exist by, ( alternatives " " *. -8 A First, holds Note that " ". Since " " (by the induction hypothesis on ) and " " (by the construction of! ), it follows that " " holds. Second, holds By (just proven above) that " ",! has performed ". By the completeness of alternatives,! must have performed, in addition to ", some access alternatives. That is, is in both alternatives and " ", and thus holds. In summary, we stress that, as an algorithm generating framework, defines an optimization space that is general yet specific. This space, *, consists of algorithms that conform to but implements Select differently. Our goal, in principle, is thus to instantiate an optimal algorithm! in *, which depends on query and data-specific factors. Section 7 will discuss optimization techniques for finding! such that, refining Eq. 2! - 5 #! * (4) 7. SEARCH DYNAMIC OPTIMIZATION In this section, we discuss how to actually optimize top- queries, using Framework in Section 6. As briefly discussed, with optimization space * defined, query optimization problem is now identifying the cost optimal algorithm! in Eq. 4. For systematic optimization, we must address the following three tasks, each of which corresponds to its counterpart in Boolean query optimization 1. Space reduction While already much focused than the space of arbitrary algorithms, * is still too large for exhaustive search. We thus design a suite of systematic heuristics to reduce the space. Similarly, Boolean query optimization relies on systematic heuristics for effective search, such as focusing only on linear joins.! #"%$&(' )*,+ -/.0.1*2 3* *9 ;8<6&>? *@ >? #@A? #B B 78C(9 >? D#4 E/4 F 5 678*9 ;8<6&>? #@!A? B 78C(9 >? G&4 E/4 F EH4 I 678*9 ;8<6&>? #@!A? B 78C(9 >? JH4 E/4 F EH4 K 678*9 ;8<6&>? #@!A? B 78C(9 A? B 78C Figure 8 Illustration of SR/G heuristics. 2. Search Within the space identified, we design effective optimization schemes focusing search on promising algorithms. Similarly, Boolean optimization focuses its search on plans enumerated in particular ways, e.g., by dynamic programming. 3. Cost estimation As a ground to compare algorithms in the space, the optimizer must be able to estimate the cost of each algorithm. Our cost estimation extends the insight of its Boolean counterpart, as we will discuss in Section Space Reduction While help optimization by inducing a focused algorithm space, it is still large for exhaustive search At each iteration, may Select any type of access on any unevaluated predicates of top- objects. We thus need to further focus within, with some systematic heuristics. These heuristics contribute in two ways First, they reduce the space significantly, while still retaining the promising algorithms for consideration. Second, they give orders to the reduced space, so that algorithm can be systematically enumerated, by varying a few configuration parameters. In particular, we use the following heuristics for optimization First, we choose to focus only on < algorithms (for sortedthen-random), which perform all 3$C on predicate, if done at all, before any any 5C *. Lemma 1 states that, for any top- algorithm, we have its < -counterpart gathering the same score information, with no more cost. Lemma 1 (LNM -counterpart) For any algorithm! *, there > exists its < -counterpart! > with no more cost, i.e., (! ) (! ). Lemma 1 allows us to reduce our plan space by focusing only on the subset of SR algorithms, i.e., < -subset. However, how good is this heuristics? Will we miss the actual optimal algorithm, by such reduction? By Lemma 1, we can conclude that the!o* reduction has no loss of optimality as long as the < -counterpart of!o is still in a property we call < -inclusion. We believe <& -subset reduction is at least a good heuristics with little loss of optimality, as <& -inclusion does hold in our empirical observations, though we don t have a formal proof. Second, we assume that random access on every object follows the same global order P. That is, when multiple random accesses exist in alternatives, we follow some particular order P (given by the optimizer; See Section 7.2) to choose which to perform. To illustrate, supposing necessary choices are alternatives - 5C * +5C given P - *, we pick 5C+ 6* first as the next unevaluated predicate of is according to P, which we denote as RQ TS$ *P*0-. This heuristics has been first studied in [5] (which focuses only random access probes, unlike our general optimization). As [5] reported, such global scheduling achieves comparable optimization result, while significantly reducing the complexity. By focusing on the above two heuristics, we propose Framework with SR/G (SR-subset and Global scheduling) heuristics. These heuristics customize the Select routine of as Figure 9 shows Now the selection is more focused, guided by two * 8

9 > * > Procedure Select (alternatives, *P ) if 3$C alternatives such that 7 34C ; else if A5C alternatives such that -/RQ3 S$ 5C * ; Figure 9 Select with SR/G heuristics. *P * parameters - * and P - *, which will be determined by the optimizer (Section 7.2). In essence, Select chooses sorted access whenever there exists 3$C which hasn t reached the suggested depth, i.e., 7. Otherwise, it performs random access in alternatives, by picking the next unevaluated predicate (according to P ). Example 10 illustrates how these heuristics actually work with our running example. (For the sake of presentation, from here on refers to the framework with SR/G heuristics.) Example 10 (SR/G heuristics) Consider our running example on Dataset Figure 8 illustrates how SR/G heuristics guide the access selection of when - * and P - G$*. At step 1, among necessary choices alternatives -, Select focuses on 3$C and 34C, as the suggested sorted access depths haven t been reached yet i.e., 7 - and G>7 -. (We arbitrary pick one, e.g., 3$C.) Similarly, at step 2 and 3, Select chooses 3$CJ, until it lowers G below the suggested depth after step 3. Then, at step 4, we perform 5C *, which completes the evaluation on. can thus return as the top-1 answer with four accesses " - 3$C I3$CJI3$CJ+5C *, as, than the maximal-possible scores of the rest. is higher In addition to reducing the search space, the SR/G heuristics enable to enumerate algorithms by parameters and P, i.e., every SR algorithm can be identified by (,P ) pair. Consequently, our optimization problem can now be restated as identifying the minimal-cost algorithm * *P * such that *P * - + *P*+*., P 7.2 Search Toward identifying the optimal algorithm * *P approximate the problem by identifying and P -optimization We first identify the optimal depth * respect to some initial schedule P, i.e., -, + *P *+* *, we first in turn, with P -optimization We then identify the optimal scheduling P with respect to identified. For P optimization, we can adopt [5], which similarly determines a global predicate scheduling, as explained in Section 7.1. Thus, in this section, we focus on optimization As Example 11 will illustrate, optimization is specific to runtime factors, e.g., score functions, predicate score distributions, and cost scenarios. Example 11 ( Optimization Possibilities) To illustrate, we continue Example 10 with a different depth configuration - *. In fact, generates the algorithm illustrated in Figure 7 it starts with 3$C as 7, but chooses 5CJ * next as G. Observe from this example that different configurations imply different access costs While a parallel configuration of - * required four accesses to answer (Figure 8), a focused configuration - 4* requires only two accesses (Figure 7). However, note that, this finding is only specific For instance, when scoring function, is CDFE (the average function) for the same query, requires less accesses (4 accesses) than (6 accesses). Consequently, we need search schemes that systematically adapt to the given query, in exploring space, i.e., -dimensional space of 8-. We first discuss an exhaustive search scheme Naive, which will be used as a baseline for comparison (Section 9). We then enhance the scheme with more informed (either query-driven or generic) search. (Scheme Naive) Naive simply explores the whole space by meshing it into a finite set of grid points. Then, for every grid point, it estimates the cost (See Section 7.3) of every algorithm P * and idenfies the minimal-cost configuration among them. Though simple, Naive obviously suffers from scalability and performance limitations, especially when space explodes for large. We thus enhance Naive to systematically focus on a promising subset of, as follow. (Scheme Strategies) Strategies enhances Naive approach by applying query-driven strategies in the search for. As illustrated in Example 11, a particular scoring function often implies a particular best strategy to narrow down search, e.g., parallel configurations for CDFE and focused configurations for 01. Thus, Scheme Strategies focuses its search on some configurations corresponding to the given strategy. (Scheme HClimb) As an alternative to query-specific Strategies scheme, one can apply a generic informed search to enhance Naive scheme. For instance, one can apply hill climbing scheme From a random point, HClimb simply searches towards its neighboring configuration with less estimated cost, until it reaches the minimum. The scheme is typically enhanced with multiple random starting points, to avoid being stuck at the local minimum. In particular, our experiments in Section 9 will adopt HClimb as an optimization scheme, which is evaluated to be the most effective from our experiments in Appendix. 7.3 Cost Estimation Finally, we discuss how to estimate the cost of algorithms in space. To motivate, recall the cost estimation for Boolean queries First, optimizer estimates the selectivity of each predicate using some statistical samples, e.g., histograms. Second, it then estimates their aggregate effect, from which the overall cost can be computed The aggregated effect is computed analytically in Boolean queries, as predicates are composed by the known set of relational operators, e.g., or. For instance, in a simple conjunctive query, the aggregate selectivity is simply the product of selectivities, assuming predicate independence. For top- queries, we extend the same intuition in the following ways First, we generalize Boolean selectivity into the selectivity of probabilistic score distributions, which can be similarly estimated from statistical samples. Second, we estimate the aggregate selectivity of predicates, which is challenging for top- queries As predicates are aggregated by arbitrary function,, the aggregate effect cannot be quantified by analytic composition as in Boolean optimization, but only by simulation runs Simulation is essentially a mimic of the actual execution on sample objects. In particular, we perform a simulation run on the samples, transforming a top- query on the database into a top- query on the samples. The retrieval size is determined in proportional to the sample size 3, i.e., G- <. In principle, samples can be obtained from online sampling, or built offline (e.g., based on a priori knowledge on predicate score 9

10 "! #!%$& '( ) ) * + * -,/ ( 56( 7 ) ,/ ; 89< 3 1 ; 8>9< DEFG HIJK Figure 10 with no wild guesses. distribution.) However, when samples are unavailable or too costly to obtain online, one can generate dummy samples based on the assumed distribution (e.g., uniform) Though such samples cannot represent actual score distributions, they help optimize for other important aspects, such as, or. While our optimizer will certainly benefit from accurate samples, Section 9 will implement our optimization framework using dummy samples, to validate our framework in the worst case scenario. 8. UNIFICATION AND CONTRAST With general optimization, should in principle unify algorithms for specific scenarios We thus study how 4 in fact unifies specific algorithms, by generating similar behaviors, and further contrast them, by identifying those ungeneralizable behaviors. As middleware algorithms generally assume no-wild-guesses [9], we first describe how handles this restriction (while can generally work with or without). In such settings, an algorithm cannot refer to an object (for random access) before knowing it from some sorted access. Thus must distinguish between seen and unseen objects will remain unseen until hit by some sorted access, when it becomes seen. We introduce a virtual object unseen to represent all unseen objects Note all such objects share the same maximal-possible score, unseen,0 *. This virtual object needs special handling, as Figure 10 shows with query First, initially all objects are unseen, so now initializes with only the unseen. Second, when this unseen is at the top (e.g., step 1), its induced choices unseen will contain only sorted accesses, since random access is not allowed for an unseen object, by the no-wild-guesses assumption. Third, objects hit by some sorted access will become seen (e.g., seen by 3$CG at step 1) They will be then handled as usual and may surface to (e.g., at step 2). 8.1 Algorithm We now observe how adapts to scenarios. As Figure 2 summarized, aims at access scenarios where sorted and random access have uniform unit costs, i.e., 2$3@? 2)5. In brief, works as follows Perform sorted accesses on predicates in parallel, or equaldepth 5. As an object is seen from any sorted access, perform 5C * exhaustively for every unevaluated predicate to compute its final score,. Add to, if it is one of the highest so far. Let threshold AB>C, *. As soon as has objects with scores no less than AB>C, stop and output. In essence, can be characterized by three behaviors (1) equal-depth-sorted-access At each iteration it performs sorted accesses to all predicates. (2) exhaustive-random-access It then does 4 For notational simplicity, we use interchangeably as an abstract framework and as the optimal algorithm generated. 5 Note that the depth of sorted access, in this context of, refers to the number of objects accessed, instead of the score reached. (a) scenario < (b) scenario < Figure 11 Illustration of and. exhaustive random accesses on every seen object. (3) early-stop It terminates as soon as the stop condition D, D A BC is satisfied. So, would adapt to uniform scenarios by dynamic optimization and generate similar behaviors? Unification In symmetric cases (which will be clear later), which s behaviors are optimized for, will indeed generate We illustrate with a scenario < with scoring function, -8CDFEG G4*, in which the scores of and G are uniformly distributed over and 243-2)5 -. To observe how adapts to <, Figure 11(a) shows a contour plot of > *P * with respect to.-. 4$*. identifies the minimal-cost, or the darkest cell marked by a rectangle, at around (.85,.83). To compare, the figure also marks the depth reaches (by an oval) at (.84,.84). 6 Observe that the two algorithms are indeed almost identical (1) Both perform equal-depth-sorted-access up to similar depths. (2) By accessing the same depths, they will both see the same set of objects Since does not use exhaustive random access, it will only perform less random accesses than, e.g., slightly outperforms (by 1%) in Figure 11(a). (3) The output of shares the same early-stop condition as Since, unseen - A B>C (by definition) and unseen, it follows that D, D, unseen -/AB>C. Contrast However, contrasts with by being able to adapt Even among uniform scenarios, in the asymmetric cases, s characteristic behaviors cannot adapt well. 1. Equal-depth-sorted-access is not desirable, in scenarios when the optimal depth is not equal across predicates e.g., for, 01 *, focused sorted access is more effective (Example 11). 2. Exhaustive-random-access is not desirable As contrasted above, by scheduling both sorted and random accesses, performs less random accesses. 3. Early-stop is not desirable, if performing deeper sorted access can trade those random accesses to follow and thus reduce the total cost i.e., trade-off exists between deeper sorted accesses and more random accesses. In fact, in such scenarios, will adapt beyond and thus generate a rather different algorithm. To contrast, Figure 11(b) shows scenario < with, - 01 (and otherwise the same as < ). Observe and differ significantly focuses sorted accesses - *, while performs equal-depth sorted with access up to 5, *. Observe also their cost difference is significant as well saves access cost by 30% from, by focusing sorted accesses. For a closer observation, Figure 12 compares the relative access costs of and (normalized to the total cost of as ML ) in various scenarios As symmetric cases, Figure 12(a) first considers scenario <, which is rather favorable to (as ex- 6 This figure can be viewed or printed in color, for better visibility. 10

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

Lecture 4: 3SAT and Latin Squares. 1 Partial Latin Squares Completable in Polynomial Time

Lecture 4: 3SAT and Latin Squares. 1 Partial Latin Squares Completable in Polynomial Time NP and Latin Squares Instructor: Padraic Bartlett Lecture 4: 3SAT and Latin Squares Week 4 Mathcamp 2014 This talk s focus is on the computational complexity of completing partial Latin squares. Our first

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Rigidity, connectivity and graph decompositions

Rigidity, connectivity and graph decompositions First Prev Next Last Rigidity, connectivity and graph decompositions Brigitte Servatius Herman Servatius Worcester Polytechnic Institute Page 1 of 100 First Prev Next Last Page 2 of 100 We say that a framework

More information

Module 11. Directed Graphs. Contents

Module 11. Directed Graphs. Contents Module 11 Directed Graphs Contents 11.1 Basic concepts......................... 256 Underlying graph of a digraph................ 257 Out-degrees and in-degrees.................. 258 Isomorphism..........................

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

1 Linear programming relaxation

1 Linear programming relaxation Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Primal-dual min-cost bipartite matching August 27 30 1 Linear programming relaxation Recall that in the bipartite minimum-cost perfect matching

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Simplicity is Beauty: Improved Upper Bounds for Vertex Cover

Simplicity is Beauty: Improved Upper Bounds for Vertex Cover Simplicity is Beauty: Improved Upper Bounds for Vertex Cover Jianer Chen, Iyad A. Kanj, and Ge Xia Department of Computer Science, Texas A&M University, College Station, TX 77843 email: {chen, gexia}@cs.tamu.edu

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Bipartite Roots of Graphs

Bipartite Roots of Graphs Bipartite Roots of Graphs Lap Chi Lau Department of Computer Science University of Toronto Graph H is a root of graph G if there exists a positive integer k such that x and y are adjacent in G if and only

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

modern database systems lecture 5 : top-k retrieval

modern database systems lecture 5 : top-k retrieval modern database systems lecture 5 : top-k retrieval Aristides Gionis Michael Mathioudakis spring 2016 announcements problem session on Monday, March 7, 2-4pm, at T2 solutions of the problems in homework

More information

STABILITY AND PARADOX IN ALGORITHMIC LOGIC

STABILITY AND PARADOX IN ALGORITHMIC LOGIC STABILITY AND PARADOX IN ALGORITHMIC LOGIC WAYNE AITKEN, JEFFREY A. BARRETT Abstract. Algorithmic logic is the logic of basic statements concerning algorithms and the algorithmic rules of deduction between

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Safe Stratified Datalog With Integer Order Does not Have Syntax

Safe Stratified Datalog With Integer Order Does not Have Syntax Safe Stratified Datalog With Integer Order Does not Have Syntax Alexei P. Stolboushkin Department of Mathematics UCLA Los Angeles, CA 90024-1555 aps@math.ucla.edu Michael A. Taitslin Department of Computer

More information

Monotone Paths in Geometric Triangulations

Monotone Paths in Geometric Triangulations Monotone Paths in Geometric Triangulations Adrian Dumitrescu Ritankar Mandal Csaba D. Tóth November 19, 2017 Abstract (I) We prove that the (maximum) number of monotone paths in a geometric triangulation

More information

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions. CS 787: Advanced Algorithms NP-Hardness Instructor: Dieter van Melkebeek We review the concept of polynomial-time reductions, define various classes of problems including NP-complete, and show that 3-SAT

More information

3.4 Deduction and Evaluation: Tools Conditional-Equational Logic

3.4 Deduction and Evaluation: Tools Conditional-Equational Logic 3.4 Deduction and Evaluation: Tools 3.4.1 Conditional-Equational Logic The general definition of a formal specification from above was based on the existence of a precisely defined semantics for the syntax

More information

CS264: Homework #1. Due by midnight on Thursday, January 19, 2017

CS264: Homework #1. Due by midnight on Thursday, January 19, 2017 CS264: Homework #1 Due by midnight on Thursday, January 19, 2017 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. See the course site for submission

More information

Estimating the Quality of Databases

Estimating the Quality of Databases Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality

More information

Höllische Programmiersprachen Hauptseminar im Wintersemester 2014/2015 Determinism and reliability in the context of parallel programming

Höllische Programmiersprachen Hauptseminar im Wintersemester 2014/2015 Determinism and reliability in the context of parallel programming Höllische Programmiersprachen Hauptseminar im Wintersemester 2014/2015 Determinism and reliability in the context of parallel programming Raphael Arias Technische Universität München 19.1.2015 Abstract

More information

Progress Towards the Total Domination Game 3 4 -Conjecture

Progress Towards the Total Domination Game 3 4 -Conjecture Progress Towards the Total Domination Game 3 4 -Conjecture 1 Michael A. Henning and 2 Douglas F. Rall 1 Department of Pure and Applied Mathematics University of Johannesburg Auckland Park, 2006 South Africa

More information

The Inverse of a Schema Mapping

The Inverse of a Schema Mapping The Inverse of a Schema Mapping Jorge Pérez Department of Computer Science, Universidad de Chile Blanco Encalada 2120, Santiago, Chile jperez@dcc.uchile.cl Abstract The inversion of schema mappings has

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

Maximal Independent Set

Maximal Independent Set Chapter 0 Maximal Independent Set In this chapter we present a highlight of this course, a fast maximal independent set (MIS) algorithm. The algorithm is the first randomized algorithm that we study in

More information

On the Hardness of Counting the Solutions of SPARQL Queries

On the Hardness of Counting the Solutions of SPARQL Queries On the Hardness of Counting the Solutions of SPARQL Queries Reinhard Pichler and Sebastian Skritek Vienna University of Technology, Faculty of Informatics {pichler,skritek}@dbai.tuwien.ac.at 1 Introduction

More information

Clustering Using Graph Connectivity

Clustering Using Graph Connectivity Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the

More information

Recursively Enumerable Languages, Turing Machines, and Decidability

Recursively Enumerable Languages, Turing Machines, and Decidability Recursively Enumerable Languages, Turing Machines, and Decidability 1 Problem Reduction: Basic Concepts and Analogies The concept of problem reduction is simple at a high level. You simply take an algorithm

More information

Principles of AI Planning. Principles of AI Planning. 8.1 Parallel plans. 8.2 Relaxed planning graphs. 8.3 Relaxation heuristics. 8.

Principles of AI Planning. Principles of AI Planning. 8.1 Parallel plans. 8.2 Relaxed planning graphs. 8.3 Relaxation heuristics. 8. Principles of AI Planning June th, 8. Planning as search: relaxation heuristics Principles of AI Planning 8. Planning as search: relaxation heuristics alte Helmert and Bernhard Nebel Albert-Ludwigs-Universität

More information

Optimization I : Brute force and Greedy strategy

Optimization I : Brute force and Greedy strategy Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean

More information

The Structure of Bull-Free Perfect Graphs

The Structure of Bull-Free Perfect Graphs The Structure of Bull-Free Perfect Graphs Maria Chudnovsky and Irena Penev Columbia University, New York, NY 10027 USA May 18, 2012 Abstract The bull is a graph consisting of a triangle and two vertex-disjoint

More information

Decreasing the Diameter of Bounded Degree Graphs

Decreasing the Diameter of Bounded Degree Graphs Decreasing the Diameter of Bounded Degree Graphs Noga Alon András Gyárfás Miklós Ruszinkó February, 00 To the memory of Paul Erdős Abstract Let f d (G) denote the minimum number of edges that have to be

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Multi-objective Query Processing for Database Systems

Multi-objective Query Processing for Database Systems Multi-objective Query Processing for Database Systems Wolf-Tilo Balke Computer Science Department University of California Berkeley, CA, USA balke@eecs.berkeley.edu Abstract Query processing in database

More information

3.7 Denotational Semantics

3.7 Denotational Semantics 3.7 Denotational Semantics Denotational semantics, also known as fixed-point semantics, associates to each programming language construct a well-defined and rigorously understood mathematical object. These

More information

A Reduction of Conway s Thrackle Conjecture

A Reduction of Conway s Thrackle Conjecture A Reduction of Conway s Thrackle Conjecture Wei Li, Karen Daniels, and Konstantin Rybnikov Department of Computer Science and Department of Mathematical Sciences University of Massachusetts, Lowell 01854

More information

1 Connected components in undirected graphs

1 Connected components in undirected graphs Lecture 10 Connected components of undirected and directed graphs Scribe: Luke Johnston (2016) and Mary Wootters (2017) Date: October 25, 2017 Much of the following notes were taken from Tim Roughgarden

More information

PCP and Hardness of Approximation

PCP and Hardness of Approximation PCP and Hardness of Approximation January 30, 2009 Our goal herein is to define and prove basic concepts regarding hardness of approximation. We will state but obviously not prove a PCP theorem as a starting

More information

Advanced Combinatorial Optimization September 17, Lecture 3. Sketch some results regarding ear-decompositions and factor-critical graphs.

Advanced Combinatorial Optimization September 17, Lecture 3. Sketch some results regarding ear-decompositions and factor-critical graphs. 18.438 Advanced Combinatorial Optimization September 17, 2009 Lecturer: Michel X. Goemans Lecture 3 Scribe: Aleksander Madry ( Based on notes by Robert Kleinberg and Dan Stratila.) In this lecture, we

More information

Module 4. Constraint satisfaction problems. Version 2 CSE IIT, Kharagpur

Module 4. Constraint satisfaction problems. Version 2 CSE IIT, Kharagpur Module 4 Constraint satisfaction problems Lesson 10 Constraint satisfaction problems - II 4.5 Variable and Value Ordering A search algorithm for constraint satisfaction requires the order in which variables

More information

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,

More information

Chapter 14 Global Search Algorithms

Chapter 14 Global Search Algorithms Chapter 14 Global Search Algorithms An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Introduction We discuss various search methods that attempts to search throughout the entire feasible set.

More information

ACONCURRENT system may be viewed as a collection of

ACONCURRENT system may be viewed as a collection of 252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 3, MARCH 1999 Constructing a Reliable Test&Set Bit Frank Stomp and Gadi Taubenfeld AbstractÐThe problem of computing with faulty

More information

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example).

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example). 98 CHAPTER 3. PROPERTIES OF CONVEX SETS: A GLIMPSE 3.2 Separation Theorems It seems intuitively rather obvious that if A and B are two nonempty disjoint convex sets in A 2, then there is a line, H, separating

More information

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review 1 Review 1 Something I did not emphasize enough last time is that during the execution of depth-firstsearch, we construct depth-first-search trees. One graph may have multiple depth-firstsearch trees,

More information

The 4/5 Upper Bound on the Game Total Domination Number

The 4/5 Upper Bound on the Game Total Domination Number The 4/ Upper Bound on the Game Total Domination Number Michael A. Henning a Sandi Klavžar b,c,d Douglas F. Rall e a Department of Mathematics, University of Johannesburg, South Africa mahenning@uj.ac.za

More information

FOUR EDGE-INDEPENDENT SPANNING TREES 1

FOUR EDGE-INDEPENDENT SPANNING TREES 1 FOUR EDGE-INDEPENDENT SPANNING TREES 1 Alexander Hoyer and Robin Thomas School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332-0160, USA ABSTRACT We prove an ear-decomposition theorem

More information

Exact and Approximate Generic Multi-criteria Top-k Query Processing

Exact and Approximate Generic Multi-criteria Top-k Query Processing Exact and Approximate Generic Multi-criteria Top-k Query Processing Mehdi Badr, Dan Vodislav To cite this version: Mehdi Badr, Dan Vodislav. Exact and Approximate Generic Multi-criteria Top-k Query Processing.

More information

Framework for Design of Dynamic Programming Algorithms

Framework for Design of Dynamic Programming Algorithms CSE 441T/541T Advanced Algorithms September 22, 2010 Framework for Design of Dynamic Programming Algorithms Dynamic programming algorithms for combinatorial optimization generalize the strategy we studied

More information

Partitions and Packings of Complete Geometric Graphs with Plane Spanning Double Stars and Paths

Partitions and Packings of Complete Geometric Graphs with Plane Spanning Double Stars and Paths Partitions and Packings of Complete Geometric Graphs with Plane Spanning Double Stars and Paths Master Thesis Patrick Schnider July 25, 2015 Advisors: Prof. Dr. Emo Welzl, Manuel Wettstein Department of

More information

Constraint Satisfaction Problems

Constraint Satisfaction Problems Constraint Satisfaction Problems CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2013 Soleymani Course material: Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

MA651 Topology. Lecture 4. Topological spaces 2

MA651 Topology. Lecture 4. Topological spaces 2 MA651 Topology. Lecture 4. Topological spaces 2 This text is based on the following books: Linear Algebra and Analysis by Marc Zamansky Topology by James Dugundgji Fundamental concepts of topology by Peter

More information

CS422 - Programming Language Design

CS422 - Programming Language Design 1 CS422 - Programming Language Design Denotational Semantics Grigore Roşu Department of Computer Science University of Illinois at Urbana-Champaign 2 Denotational semantics, alsoknownasfix-point semantics,

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

Maximal Independent Set

Maximal Independent Set Chapter 4 Maximal Independent Set In this chapter we present a first highlight of this course, a fast maximal independent set (MIS) algorithm. The algorithm is the first randomized algorithm that we study

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

The Threshold Algorithm: from Middleware Systems to the Relational Engine

The Threshold Algorithm: from Middleware Systems to the Relational Engine IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.?, NO.?,?? 1 The Threshold Algorithm: from Middleware Systems to the Relational Engine Nicolas Bruno Microsoft Research nicolasb@microsoft.com Hui(Wendy)

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

Solution for Homework set 3

Solution for Homework set 3 TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities

More information

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the

More information

Network monitoring: detecting node failures

Network monitoring: detecting node failures Network monitoring: detecting node failures 1 Monitoring failures in (communication) DS A major activity in DS consists of monitoring whether all the system components work properly To our scopes, we will

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

1 Overview, Models of Computation, Brent s Theorem

1 Overview, Models of Computation, Brent s Theorem CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 1, 4/3/2017. Scribed by Andreas Santucci. 1 Overview,

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

Algorithms, Games, and Networks February 21, Lecture 12

Algorithms, Games, and Networks February 21, Lecture 12 Algorithms, Games, and Networks February, 03 Lecturer: Ariel Procaccia Lecture Scribe: Sercan Yıldız Overview In this lecture, we introduce the axiomatic approach to social choice theory. In particular,

More information

On partial order semantics for SAT/SMT-based symbolic encodings of weak memory concurrency

On partial order semantics for SAT/SMT-based symbolic encodings of weak memory concurrency On partial order semantics for SAT/SMT-based symbolic encodings of weak memory concurrency Alex Horn and Daniel Kroening University of Oxford April 30, 2015 Outline What s Our Problem? Motivation and Example

More information

Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No.

Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No. Fundamentals of Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture No. # 13 Transportation Problem, Methods for Initial Basic Feasible

More information

RankSQL: Query Algebra and Optimization for Relational

RankSQL: Query Algebra and Optimization for Relational UIUC Technical Report: UIUCDCS-R-2004-2464, UILU-ENG-2004-1765. July 2004 (Revised March 2005) RankSQL: Query Algebra and Optimization for Relational Top-k Queries Chengkai Li 1 Kevin Chen-Chuan Chang

More information

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and Computer Language Theory Chapter 4: Decidability 1 Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

More information

One of the most important areas where quantifier logic is used is formal specification of computer programs.

One of the most important areas where quantifier logic is used is formal specification of computer programs. Section 5.2 Formal specification of computer programs One of the most important areas where quantifier logic is used is formal specification of computer programs. Specification takes place on several levels

More information

Exploring a Few Good Tuples From Text Databases

Exploring a Few Good Tuples From Text Databases Exploring a Few Good Tuples From Text Databases Alpa Jain, Divesh Srivastava Columbia University, AT&T Labs-Research Abstract Information extraction from text databases is a useful paradigm to populate

More information

Device-to-Device Networking Meets Cellular via Network Coding

Device-to-Device Networking Meets Cellular via Network Coding Device-to-Device Networking Meets Cellular via Network Coding Yasaman Keshtkarjahromi, Student Member, IEEE, Hulya Seferoglu, Member, IEEE, Rashid Ansari, Fellow, IEEE, and Ashfaq Khokhar, Fellow, IEEE

More information

arxiv:submit/ [math.co] 9 May 2011

arxiv:submit/ [math.co] 9 May 2011 arxiv:submit/0243374 [math.co] 9 May 2011 Connectivity and tree structure in finite graphs J. Carmesin R. Diestel F. Hundertmark M. Stein 6 May, 2011 Abstract We prove that, for every integer k 0, every

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

Dominance Constraints and Dominance Graphs

Dominance Constraints and Dominance Graphs Dominance Constraints and Dominance Graphs David Steurer Saarland University Abstract. Dominance constraints logically describe trees in terms of their adjacency and dominance, i.e. reachability, relation.

More information

arxiv: v1 [cs.ma] 8 May 2018

arxiv: v1 [cs.ma] 8 May 2018 Ordinal Approximation for Social Choice, Matching, and Facility Location Problems given Candidate Positions Elliot Anshelevich and Wennan Zhu arxiv:1805.03103v1 [cs.ma] 8 May 2018 May 9, 2018 Abstract

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem CS61: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem Tim Roughgarden February 5, 016 1 The Traveling Salesman Problem (TSP) In this lecture we study a famous computational problem,

More information

An algorithm for Performance Analysis of Single-Source Acyclic graphs

An algorithm for Performance Analysis of Single-Source Acyclic graphs An algorithm for Performance Analysis of Single-Source Acyclic graphs Gabriele Mencagli September 26, 2011 In this document we face with the problem of exploiting the performance analysis of acyclic graphs

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Connected Components of Underlying Graphs of Halving Lines

Connected Components of Underlying Graphs of Halving Lines arxiv:1304.5658v1 [math.co] 20 Apr 2013 Connected Components of Underlying Graphs of Halving Lines Tanya Khovanova MIT November 5, 2018 Abstract Dai Yang MIT In this paper we discuss the connected components

More information

Evaluating Top-k Queries Over Web-Accessible Databases

Evaluating Top-k Queries Over Web-Accessible Databases Evaluating Top-k Queries Over Web-Accessible Databases AMÉLIE MARIAN Columbia University, New York NICOLAS BRUNO Microsoft Research, Redmond, Washington and LUIS GRAVANO Columbia University, New York A

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Abstract We present two parameterized algorithms for the Minimum Fill-In problem, also known as Chordal

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht

More information