Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Size: px
Start display at page:

Download "Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation"

Transcription

1 DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt Shinkiba, Koto-ku, Tokyo, Japan Ranom sample generation from a atabase can be helpful to exploratory ata analysis because of its capability of fast approximate aggregation of large ata. For efficient run-time ranom sample generation, ranom clustering, a technique to ranomly sort table rows on isk in avance, has been extensively use in previous research. However, this technique can ranomize rows of a table only for a single unit. Thus, if ata analysts require sampling for other units than the one use for clustering, traitional single-unit ranom clustering is no longer of help an causes a significant slowown ue to many ranom isk accesses. In this paper, we propose a ranom clustering technique for multiple units rather than for a single unit. This technique can ranomize table rows for multiple units on isk. After clustering, a ata analyst can select any one of the units an then samples for the selecte unit can be efficiently rea from isk until one satisfies the given filtering criteria an size conition. Experiments show that the new ranom clustering technique for multiple units generate samples significantly faster than traitional single-unit ranom clustering. Key wors Ranom sampling from atabases, Exploratory ata analysis, Online aggregation 1. Introuction Ranom sampling from a atabase has been stuie for a long time, an it is gaining even more significance owing to the emerging eman for large ata analyses. Calculating exact aggregates over massive ata requires long execution times an vast amounts of resources. Sampling can reuce such costs by calculating approximate aggregates over a small amount of ata extracte from the original tables. Run-time sample generation has been use in approximate query processing (AQP) as well as sample precomputing. In precomputing approaches, samples are precompute an store apart from the original ata so that approximate aggregates can be calculate from the samples. On the other han, run-time generation approaches o not store such precompute samples. They extract rows in ranom orer at query time an calculate approximate answers from them. An avantage of run-time approaches is their high aaptability to a-hoc queries. Run-time approaches can generate a sample of appropriate size for estimating the answer to a given query by continuously expaning the sample size until it becomes sufficient. This property has mae run-time sample generation a key component of online aggregation [8], which is a framework for presenting a running estimate base on a sample until the user is satisfie with it. The time efficiency of run-time sample generation epens on the physical ata layout on isk. Without any assumption, ranomly sample rows may be istribute over a large part of the isk-resient original table, wherein the ranom isk accesses substantially slow the sampling process. To avoi this problem, the first online aggregation paper [8] propose the ranom clustering technique, which ranomly sorts rows in a isk-resient table in a pre-processing phase. If rows are ranomly clustere, a sample can be efficiently obtaine through a sequential scan because rows in continuous blocks can be consiere a ranom sample an this sample is easily expane by scanning more blocks. Since then, ranom clustering has come to be viewe in the online aggregation literature, e.g., [7], [9], [13], as a key technique for accelerating the run-time sample generation process. AQP approaches base on sampling have been evelope to present approximate answers by automatically choosing or generating samples in the backgroun. However, simply giving approximate answers is not sufficient for ata analysts familiar with avance statistical analysis tools. Most statistical analysis tools toay have more sophisticate analysis an visualization functions than the aggregate functions executable in atabase systems. As analysts woul want to take full avantage of their favorite tools, they woul prefer simply getting samples accoring to the requirements they give explicitly for further analysis in their own tools, instea of only getting approximate answers.

2 Generating samples accoring to explicitly given requirements is a challenging task, as is AQP. In exploratory ata analysis, analysts nee to perform analyses iteratively an from ifferent perspectives; thus, the sampling unit of concern may be ifferent in each trial. However, ranom sampling using ifferent sampling units causes a severe slowown even if the ata are ranomly clustere. This is because the traitional ranom clustering technique can only ranomize a table accoring to a single unit (in most cases, a row) an so we call the traitional technique single-unit ranom clustering. If analysts choose other units for sampling than the one use for clustering, traitional single-unit ranom clustering is no longer of help an causes a significant slowown ue to many ranom isk accesses. We call this the unit mismatch problem. This problem is inherent to singleunit ranom clustering, an any online aggregation system base on traitional ranom clustering may suffer from this problem when the aggregation nees to use ifferent units. In this paper, to avoi the unit mismatch problem, we propose a ranom clustering technique for multiple sampling units rather than a single unit. Our multi-unit ranom clustering technique can ranomize a table for several sampling units in avance so that samples of ifferent sizes for any one of the units can be rea from isk quickly. If a ata analyst gives her requirements as to the target sampling unit, the filtering criteria an the size conition, we can iterate the sample generation by increasing the sample size until the sample satisfies the given requirements. Experiments show that our multi-unit ranom clustering generate samples for multiple units significantly faster than traitional single-unit ranom clustering because the traitional metho is affecte by the unit mismatch problem. By iterative sampling, samples satisfying the given requirements were efficiently generate from the atabase clustere using our technique even though the requirements involve sampling of istinct values in a foreign key (typical examples that suffer from the unit mismatch problem). The source coe of the sampling system is available online ( The rest of this paper is organize as follows. In Section 2, we iscuss the unit mismatch problem. Section 3 explains the multi-unit clustering technique, the iterative sampling algorithm that returns samples accoring to analytical requirements an the system that executes the iterative sampling. In Section 4, we experimentally evaluate our approach. We iscuss relate work in Section 5 an give our conclusions in Section 6. Figure 1 Example atabase for illustrating the unit mismatch problem. PK represents a primary key, an FK represents a foreign key. 2. Unit Mismatch Problem The unit mismatch problem is a problem of many ranom isk accesses that occur when the sampling unit iffers from the unit of ranom clustering. To illustrate this problem, let us consier the example atabase shown in Figure 1. Each row in table lineitem represents a line item of a transactional orer an has a primary key l linekey an foreign keys corresponing to the primary keys of three other tables that represent orers, parts, an suppliers relate to the line items. lineitem an the other three tables have many-to-one relationships: An orer (part or supplier) may be relate to more than one row in lineitem. In aition, we assume that all tables are so large that sampling is neee to analyze them. This means that this atabase has four entity types that can be chosen as the sampling unit: a line item, an orer, a part, an a supplier. Let us see what happens when all the tables are ranomly clustere in the typical way, i.e., when all rows are ranomly shuffle in avance. If the analyst wants to estimate the average sales per line item by sampling from table lineitem, the sampling can be one efficiently by making a sequential scan of the first rows in lineitem. However, If the analyst next wants to estimate the average total sales per orer, she nees to ranomly choose a small number of orers, collect all line items relate to the sample orers, an then average the sum of the prices for the orers. The orers can be efficiently sample by making a sequential scan of the first rows in table orer. However, the rows relate to each orer are ranomly istribute over table lineitem because the rows in lineitem are ranomly shuffle an there is no guarantee that rows having the same l orerkey value can be foun in ajacent isk blocks. Thus, ranom sampling consiering an orer as the sampling unit is slowe own by many ranom isk accesses to lineitem. In general, by using traitional ranom clustering, a table can be clustere for only a single unit, an many ranom isk accesses can be avoie only if the sampling unit matches the unit of the clustering. If one of the other units is chosen as

3 the sampling unit, ranom isk accesses are not reuce. Thus, the purpose of this paper is to overcome this problem. 3. Multi-unit Clustering 3. 1 Clustering Proceure We explain how rows can be ranomly clustere for multiple sampling units. Large tables generally have several attributes such that the istinct values in each of them represent numerous entities such as IP aresses or URLs an can be use as sampling units. We call such attributes unit keys an our multi-unit clustering technique clusters rows in a table so that we can efficiently sample rows consiering the istinct values of the chosen unit keys as the sampling units. In Figure 1, for example, l linekey, l orerkey, l partkey an l suppkey are goo choices for the unit keys of lineitem. Our technique leverages compoun clustering inexes, which are supporte by many atabase systems. First, we esignate several attributes as the unit keys. Then, we set compoun clustering keys in the tables by appening several attributes erive from the given unit keys. We use a pairwise inepenent hash function h(v) that maps each istinct value v of the unit keys to [0, 2 L 1]. We assigne each istinct value v of the unit keys a level:level(v) = L lg h(v) 1 if h(v) > 0 an Level(v) = L for h(v) = 0. The minimum level is 0, an the maximum is L. The probability that the level of v is at least l [0, L] is P r(level(v) > = l) = 2 l. This level function is the same as the one use in istinct sampling [6] to keep the sample size small. We use it to cluster the original table in preparation for sampling using ifferent units. If the table has U unit keys, we appen U level attributes that store the levels for the unit keys an appen a level sum attribute that stores the sum of the U levels. Then, we cluster the table by setting a compoun clustering key in it. The first component of the key is the level sum attribute, an the successive components are the level attributes. If the table was alreay assigne a clustering key, the original clustering key is appene as the last component of the new clustering key. For example, if table orers has one clustering key o orerate an three unit keys o orerkey, o custkey an o prouctkey, we set a new clustering key compose of (level sum, o orerkey level, o custkey level, o prouctkey level, o orerate) after appening the level sum attribute an the three level attributes Extract Samples of Different Sizes Let us see how samples of ifferent sizes can be extracte from a clustere table. First, one of the unit keys of the table is chosen as the target unit key that represents the sampling unit. Then, the target level l [0, L] is chosen to etermine the sample size. All istinct values of the target unit key that have levels being at least l are sample. A sample table s of a target table t is a subset of t that consists of rows in t that have the sample values in the target unit key of t. For example, if the analyst wants to extract a sample from table lineitem in Figure 1 consiering an orer as the sampling unit, then the target unit key is l orerkey an sample table l sample is efine as follows: WITH l sample AS (SELECT * FROM lineitem WHERE l orerkey level > = l) l orerkey level stores the levels of l orerkey. The sample for l = L is the smallest sample, an the sample size increases exponentially as l ecreases. Finally, the sample for l = 0 contains all the rows of the original table. The istinct values of the target unit key in a sample table are selecte on the basis of the pairwise inepenent hash function h(v) an thus these selecte values can be seen as a ranom sample of the istinct values of the target unit key in the original table.thus, the ata analyst can estimate the properties about the target unit in the original table by only analyzing a sample table erive from it. For example, the average total sales per orer in lineitem can be estimate using the average total sales per orer in l sample because all istinct orers are sample with the same probability. We will see that the sample tables of ifferent target levels can be rea from isk efficiently. To iscuss the efficiency, we call a set of rows sharing the same set of levels in a clustere table a bucket. A bucket can be viewe as an I/O unit. All rows in a bucket share the same prefix in their clustering keys, an thus, they are contiguously clustere. If a row in a bucket is inclue in a sample table, all other rows in the bucket are also inclue. The single-unit ranom clustering suffers from the unit mismatch problem because rows can be sorte in only one ranom orer while each row is relate to multiple units. Thus, our approach uses ranom bucketing rather than orering. The rows relate to the units of the same level are ivie into multiple buckets on isk. Given the target unit at query time, our algorithm restores these rows by selecting buckets relate to the target level. Sample extraction is surely efficient if the target table has the target unit key as its only unit key. In that case, all rows are sorte only by the levels of that key an collecting rows above a certain level is efficiently one by making a sequential scan. However, if the table has extra unit keys other than the target unit key, the rows having the same level of the target unit key are ivie into many smaller buckets because of the existence of the levels of the other unit keys. Therefore, we nee to estimate the effect of the number of unit keys on the efficiency of our sample generation. For a table having N rows, the expecte size of a bucket with levels (l 0, l 1,..., l U 1) is N U 1 u=0 2 1 lu = N2 U U 1 u=0 lu.

4 Figure 2 l e e e e El Ele Ele Upper boun of the probability that a row is in a minibucket This is a monotonic function of the level sum. Our technique clusters rows in ascening orer of the level sum, i.e., their expecte size. Thus, the smallest buckets are probabilistically clustere in the blocks at the bottom of the table. We assume that each isk block stores B rows. Buckets having at least B rows can be efficiently rea from isk because they are store in at least one contiguous block. On the other han, buckets with fewer than B rows may reuce the isk-rea efficiency. A bucket is calle a mini-bucket if the expecte number of its rows is smaller than B. Here, we show that mini-buckets actually occupy a small fraction of the whole table an o not significantly reuce efficiency if the table has many rows an a few unit keys. Consier a row r in a table t of N rows with U unit keys. Denote by S the set of all rows in mini-buckets of t. The probability that r S is boune as follows: Γ(U, ln(n/b)) (U 1)! < Γ(U, ln(n/b) U ln 2) = P r(r S) < (U 1)! = P ub (U, N/B) where Γ(a, x) is the upper incomplete gamma function. The upper boun P ub (U, N/B) is plotte for varying U an N/B in Figure 2. This graph shows that the probability is reasonably small if N/B is large an U is small. For example, the probability is smaller than for N/B = 10 6 an U = 4. The probability is equal to the expecte ratio of the number of rows in the mini-buckets to N. Thus, the mini-buckets occupy fewer than N/B P ub (U, N/B) blocks in expectation. Thus, our algorithm can collect buckets of the same level with a small overhea cause by reaing the blocks storing mini-buckets. The expecte size of the smallest sample table, which is obtaine at l = L, is N2 L. Thus, L shoul be at least O(log N), which makes the expecte size of the smallest sample constant. If L is larger than O(log N), the choice of L oes not affect the efficiency, because the samples for l > O(log N) are very small an clustere in the blocks storing mini-buckets an o not cause many ranom accesses Iterative Sampling We showe that sample tables of ifferent target levels can be efficiently extracte from a clustere table. However, the sample table probably still inclues ata irrelevant to the analyses, an the ata analyst nees to filter out such ata by the filtering criteria. If the remaining ata after filtering are too small for statistical analysis, the analyst nees to get larger samples by setting lower target levels. To reuce the buren, we give an algorithm that automatically chooses samples satisfying her requirements, i.e., the filtering criteria an size conition, as well as the target sampling unit. This algorithm iteratively extracts samples from multiple tables by ecreasing the target level until the requirements are met. We efine how the analyst can represent her requirements as the input to the system. First, the target sampling unit is represente by choosing one of the unit keys for each target table to be sample. For example, if sampling of orers is neee, o orerkey for table orers an l orerkey for lineitem are chosen as the target unit keys. Next, we introuce the notion of filtere table to escribe the filtering criteria an size conition. A filtere table is a table erive from sample tables an, if neee, other tables. The filtering criteria is escribe as a selection from the sample tables to the filtere table. If the analyst wants to exclue a certain kin of unit, she efines the filtere table so that rows relate to such units are exclue. The size conition can be escribe as an aggregation over the filtere table because the filtere table only contains the units that pass the filtere criteria. Given these requirements, we return a sample of sufficient size by ecreasing the target level until the given requirements are satisfie. The pseuo-coe is given in Algorithm 1. We are given a query that contains T pairs of clustere tables an their unit keys (t 0, u 0), (t 1, u 1),..., (t T 1, u T 1), filtering criteria f, size conition c, an main query q. First, we initialize the target level l to L. For each clustere table t i, we efine a sample table s i as the set of all rows in t i whose levels of u i are at least l. Next, we ask the atabase system if the size conition c is satisfie by submitting a query with the efinitions of the sample tables an the filtering criteria (checkconition in Algorithm 1). The atabase system creates the sample tables, filters them with the given criteria, an returns a truth value inicating whether the filtere tables satisfy the size conition. If the conition is not satisfie, we ecrement l by one, reefine the sample tables, an submit a size conition query again. The size conition check is repeate until the conition is satisfie or l reaches 0. Finally, we submit the main query q to the atabase system with the final efinitions of the sample tables (runmainquery). If the sampling process reaches the final step l = 0, the sample tables are the same as their

5 Algorithm 1 Sampling(pairs of a target table an its unit key (t 0, u 0),..., (t T 1, u T 1), filtering criteria f, size conition c, main query q) for l = L to 0 o for i = 0 to T 1 o efine sample table s i as the set of all rows in t i whose level of u i is at least l if checkconition(c, f, s 0,..., s T 1 ) then break return runmainquery(q, f, s 0,..., s T 1 ) Figure 3 ^ Z Z Overview of our sampling architecture original tables an the main query is exactly answere. The istinct values of the target unit keys are ranomly sample by the hash function. Thus, if one of the values is inclue in the sample of one table, all rows relate to the same value must be inclue in the samples of other tables. This assures that we can safely join the sample tables on the sample values of the target unit keys without lacking relate rows. This capability of joining on sample values is a major avantage of our algorithm Sampling System We here present a sampling system that can bring out the strengths of our clustering technique. This system receives sampling queries from ata analysts an runs the iterative sampling algorithm accoring to it. An overview is shown in Figure 3. To simplify the explanation, we assume that we are given a atabase system that alreay stores the ata the analyst wants to examine. Our sampling system works as a proxy server between the client application the analyst operates an the atabase system. It first clusters the ata of the tables in the atabase system (the clustering phase) an then waits for sampling queries to be submitte by the analyst (the sampling phase). In the clustering phase, the analyst sens a clustering request to the sampling system through the client application. In this request, the analyst esignates several attributes in large tables as unit keys. Then, the sampling system clusters the ata of the tables in the atabase system by using multi-unit clustering. After the clustering finishes, the sampling system waits for sampling queries to be submitte by the analyst. Once it receives a query, the sampling system runs the iterative sampling algorithm an returns the result to the analyst. The clustering process generally takes a long time, because all rows are sorte on isk. However, once the clustering is one, subsequent sampling queries can be answere in shorter amounts of time. We exten the SQL grammar so that the analyst can escribe her requirements. Figure 4 shows our extene SQL grammar, an Figure 5 shows an example query. The requirements are represente by SAMPLE, WITH an UNTIL clauses, an they are inserte before the main query block. The SAMPLE clause is use to choose which clustere tables to sample an the target unit key for each of them. The names of the sample tables are specifie as AS clauses. If they are omitte, the name of the original clustere table is use to refer to its sample table in the following clauses. The WITH clause is use to efine filtere tables erive from the sample tables. The filtere tables can be use later in the UNTIL clause an the main query. The WITH clause can be omitte if no filtere tables are necessary. The UNTIL clause specifies the size conition to be satisfie by the sample an filtere tables. Once the conition is satisfie, the main query following the sampling requirements is execute using the sample an filtere tables. Figure 5 shows an example query with the sampling requirements. The query is base on the atabase schema of the TPC-H benchmark. It calculates the average total price per customer who live in Japan an orere in Experimental Evaluation In the experiments, we compare our multi-unit ranom clustering with traitional single-unit ranom clustering Setting The atabase system use in the experiments was Amazon Reshift c2.xlarge single noe. We use the TPC-H benchmark atasets with a scale factor of 300. To simplify the explanation, we moifie table lineitem so that it woul have a primary key attribute l linekey that stores row is because the original lineitem oes not have a single primary key. After that, we create three ientical copies of the TPC-H atabase in the same system an clustere these three atabases using ifferent clustering techniques. We set L = 31 because it is large enough to keep the smallest sample small. All unit keys ha integer values, an we use a pairwise inepenent hash function of the form h(v) = (a v + b) mo p mo 2 L. In the first atabase, MU, each table was clustere using

6 query ::= [ sampling requirements ] main query sampling requirements ::= sample clause [ with clause ] until clause sample clause ::= SAMPLE clustere table [AS sample table] B target unit key {, clustere table [AS sample table] B target unit key} until clause ::= UNTIL size conition Figure 4 SQL grammar with sampling requirements SAMPLE orers AS o sample B o custkey, customer AS c sample B c custkey } (1) WITH filtere AS (SELECT * FROM o sample, c sample, nation WHERE o custkey = c custkey AND c nationkey = n nationkey AND n nation = JAPAN AND o orerate BETWEEN AND ) (2) UNTIL 1000 < = (SELECT COUNT(DISTINCT o custkey) FROM filtere) (3) SELECT AVG(sum) FROM (SELECT SUM(price) FROM filtere GROUP B o custkey) (4) Figure 5 Example query that calculates the average total sales in 2015 per customer who live in Japan multi-unit ranom clustering. We esignate all primary keys an foreign keys as unit keys for all tables. As a result, lineitem was assigne four unit keys: l linekey, l orerkey, l partkey, an l suppkey. Next, level an level sum attributes were appene to all tables, an the tables were clustere in the manner explaine in Section In the secon atabase, MU*, each table was clustere using multi-unit ranom clustering in the same way as MU except that the level sum was not inclue in the clustering key. This means the buckets in the tables of MU* were not arrange in orer of expecte size. Thus, unlike MU, MU* i not have the benefit of clustering small buckets. The thir atabase, SU, was a atabase that shows the effect of the traitional single-unit ranom clustering technique that consiers a row as the sampling unit. SU was basically the same as MU, but we change the level attributes * level to hash attributes * hash that store raw hash values h(v) instea of Level(v). Each table was clustere using the hash attribute of the primary key. The tables in this atabase can be viewe as having been ranomly clustere because the rows were shuffle accoring to the hash values. These three atabases containe exactly the same ata except the newly appene attributes for clustering. In the following experiments, we execute ientical queries over these atabases. The atabases returne the same results, but ha ifferent query performances because their tables were clustere using ifferent techniques Experiment 1: Sample Table Generation We evaluate the efficiency of extracting a sample table from a clustere table for ifferent target levels by irectly querying the atabase system with traitional SQL queries. We chose lineitem as the target table from which the sample tables were generate, because lineitem is the largest table in the TPC-H ataset an it has four unit keys, the largest number among the tables. Queries were execute with col cache ; i.e., all selecte rows were rea from isk. We submitte the following queries to the three atabases. A SELECT COUNT(DISTINCT unitkey) FROM lineitem WHERE unitkey level > = l B SELECT COUNT(DISTINCT unitkey) FROM lineitem WHERE unitkey hash < 2 L l where l is an integer in [0, L]. A is for MU an MU*, an B is for SU. Both queries return the number of units in the sample table, i.e., the sample size. l is a parameter to control the sample size. Upon receiving these queries for the same l, the three atabases must return the same sample size because the rows satisfying unitkey level > = l in the MU an MU* are equal to the rows satisfying unitkey hash < 2 L l in the SU. The results are shown in Figure 6. We varie l from L to 0 an plotte the query time versus the sample size. The graph for l linekey exemplifies the case that the sampling unit matches the unit of single-unit ranom clustering. For this key, the three atabases showe almost the same performance. For the other three keys, however, SU performe significantly worse than l linekey while MU still worke efficiently, as well as l linekey. This result shows how the unit mismatch problem egraes the efficiency of single-unit ranom clustering an that multi-unit ranom clustering can avoi this problem. The ifference between SU an MU was large when the sample size was moerate, but was small in the minimum an maximum limit because the minimum sample require a constant query processing cost an the maximum sample require a full scan of the table. MU performe more efficiently than MU* for l partkey an l suppkey. This ifference shows the effect of using the level sum as the first element of the clustering key, which arranges the smallest buckets in contiguous blocks Experiment 2: Iterative Sampling In Experiment 2, we evaluate the efficiency of the iterative sampling algorithm by submitting sampling queries to our sampling system. We esigne the sampling queries on the basis of three TPC-H queries: Q4, Q13, an Q17. These

7 l l ^ l l l l ^ l l l l l le le l ^^ Dh Dh ^h l l l le le ^^ Dh Dh ^h l l ^ l l l l ^ l l l l l le le ^^ l l l le ^^ Dh Dh ^h Dh Dh ^h Figure 6 Query times require to generate sample tables of varying sizes from the three atabases with respect to the four unit keys in lineitem. ^ l l l l Dh D^h ^ l l l Dh D^h ^ l l l Dh D^h l l l le le l l l l l l le le ^^ ^^ ^^ Figure 7 Query times require to answer sampling queries base on TPC-H queries Q4, Q13, an Q17 for varying sample sizes. queries are typical examples that suffer from the unit mismatch problem because their aggregation units are numerically istinct values in foreign keys. We rewrote these queries in extene SQL by appening sampling requirements. The target sampling unit an the filtering criteria were etermine accoring to the nature of the queries. For all queries, we set the size conition so as to stop sampling when l = θ, where θ is an integer scaling parameter in [0, L]. For each θ, the sampling system ran size conition queries from l = L to l = θ, then performe the main query for l = θ. For each θ, we compare the performance of our iterative sampling in the MU atabase with the performance of irectly running the main query for the final sample with l = θ in the SU atabase without running conition queries. To perform sampling in SU, the main query was execute with tables restricte by unitkey hash < 2 L l instea of unitkey level > = l in the same manner as Experiment 1. The results are shown in Figure 7. Total-MU represents the costs to run iterative sampling on the MU atabase, i.e., the total costs to run the conition queries an the main queries. Main-SU represents the costs to run the main queries for the final sample in the SU atabase. Exact represents the costs to get exact answers by running the original TPC-H queries. Although the costs of iteratively running conition queries were inclue in Total-MU, Total-MU was still significantly more efficient than Main-SU except when the sample size was very large. For Q17, a large number of blocks were rea in Main-SU even when the sample size was small because the atabase system chose to join tables to avoi inefficient ranom accesses. Nevertheless, Total-MU still worke well for Q17 because the sample rows were clustere well. These results show that samples can be efficiently extracte from the tables clustere by multi-unit ranom clustering. 5. Relate Work Sample generation. There are a number of tech-

8 niques for ranomly sampling rows from isk-resient files or atabases. They can be classifie as to whether the isk access metho is a sequential scan or ranom access. Reservoir sampling [12] is a typical example of sequential techniques. Among ranom techniques, the technique propose by Olkan an Rotem generates samples using an inexing structure like B+-tree [10] or hash file [11]. A rawback of the sequential-scan-base techniques is to nee to process the whole ata to get a ranom sample. Ranom techniques can avoi having to make a whole scan, but still require, at maximum, as many ranom block accesses as the sample size. To reuce I/Os, several stuies ranomly collect blocks instea of rows, e.g. [5]. However, rows in the same block may have a strong correlation with each other an cannot be seen as an inepenent ranom sample. The most closely relate approach to ours is istinct sampling [6], which generates samples by sequential scan consiering istinct values of a target attribute as the sampling units an using them to answer queries asking for the number of istinct values. Distinct sampling can eal with ifferent sampling units, but nees to scan the whole ata to generate samples, an it only eals with istinct value queries. AQP base on precompute samples. AQP systems typically store precompute synopses incluing samples, histograms, an wavelets for approximate aggregation. AQUA [2] stores join synopses [3] for foreign key joins an congressional samples [1] for group-by queries. BlinkDB [4] is an AQP system storing stratifie samples. Online Aggregation. Online aggregation [8] is an AQP framework base on runtime sample generation. It presents to the user a running estimate with an error boun base on the samples retrieve so far. The first online aggregation paper [8] consiere aggregation in a single table. This approach has been extene to multi-table joins [7], large tables [9] an neste queries [13]. The single-unit ranom clustering technique has been use for online aggregation in many stuies [7], [9], [13], but it cannot work efficiently if the sampling unit oes not match the unit of ranom clustering. Our clustering technique is a solution to this problem. 6. Conclusion We propose a ranom clustering technique that enables fast sample generation from isk-resient tables for multiple sampling units. It is a solution to the unit mismatch problem inherent to single-unit ranom clustering, a traitional technique use for run-time sample generation. Our technique can be use to help ata analysts by generating a sample until it satisfies their requirements. Experimental results showe our multi-unit clustering worke equally efficiently for ifferent sampling units, whereas single-unit ranom clustering worke well only for a single unit an performe significantly worse for other units because of the unit mismatch problem. References [1] S. Acharya, P. B. Gibbons, an V. Poosala. Congressional samples for approximate answering of group-by queries. In Proceeings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 00, pages , New ork, N, USA, ACM. [2] S. Acharya, P. B. Gibbons, V. Poosala, an S. Ramaswamy. The aqua approximate query answering system. In Proceeings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD 99, pages , New ork, N, USA, ACM. [3] S. Acharya, P. B. Gibbons, V. Poosala, an S. Ramaswamy. Join synopses for approximate query answering. In Proceeings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD 99, pages , New ork, N, USA, ACM. [4] S. Agarwal, B. Mozafari, A. Pana, H. Milner, S. Maen, an I. Stoica. BlinkDB: Queries with boune errors an boune response times on very large ata. In Proceeings of the 8th ACM European Conference on Computer Systems, EuroSys 13, pages 29 42, New ork, N, USA, ACM. [5] S. Chauhuri, G. Das, an U. Srivastava. Effective use of block-level sampling in statistics estimation. In Proceeings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD 04, pages , New ork, N, USA, ACM. [6] P. B. Gibbons. Distinct sampling for highly-accurate answers to istinct values queries an event reports. In Proceeings of the 27th International Conference on Very Large Data Bases, VLDB 01, pages , San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. [7] P. J. Haas an J. M. Hellerstein. Ripple joins for online aggregation. In Proceeings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD 99, pages , New ork, N, USA, ACM. [8] J. M. Hellerstein, P. J. Haas, an H. J. Wang. Online aggregation. In Proceeings of the 1997 ACM SIGMOD International Conference on Management of Data, SIGMOD 97, pages , New ork, N, USA, ACM. [9] C. Jermaine, S. Arumugam, A. Pol, an A. Dobra. Scalable approximate query processing with the bo engine. ACM Trans. Database Syst., 33(4):23:1 23:54, Dec [10] F. Olken an D. Rotem. Ranom sampling from b+ trees. In Proceeings of the 15th International Conference on Very Large Data Bases, VLDB 89, pages , San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. [11] F. Olken, D. Rotem, an P. Xu. Ranom sampling from hash files. In Proceeings of the 1990 ACM SIGMOD International Conference on Management of Data, SIGMOD 90, pages , New ork, N, USA, ACM. [12] J. S. Vitter. Ranom sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37 57, Mar [13] K. Zeng, S. Agarwal, A. Dave, M. Armbrust, an I. Stoica. G-OLA: Generalize on-line aggregation for interactive analysis on big ata. In Proceeings of the 2015 ACM SIG- MOD International Conference on Management of Data, SIGMOD 15, pages , New ork, N, USA, ACM.

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Design of Policy-Aware Differentially Private Algorithms

Design of Policy-Aware Differentially Private Algorithms Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stolyar Abstract Backpressure-base aaptive routing

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD. 1.3.15 Issue 01 Date 2018-11-21 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Lt. 2019. All rights reserve. No part of this ocument may be reprouce or transmitte in any form or by any

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 01 01 01 01 01 00 01 01 Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University

More information

Overlap Interval Partition Join

Overlap Interval Partition Join Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stoylar arxiv:15.4984v1 [cs.ni] 27 May 21 Abstract

More information

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly International Journal "Information Technologies an Knowlege" Vol. / 2007 309 [Project MINERVAEUROPE] Project MINERVAEUROPE: Ministerial Network for Valorising Activities in igitalisation -

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

k-balanced Sorting and Skew Join in MPI and MapReduce

k-balanced Sorting and Skew Join in MPI and MapReduce k-balance Sorting an Skew Join in MPI an MapReuce Silu Huang, Aa Wai-Chee Fu Department of Computer Science an Engineering, Chinese University of Hong Kong slhuang,aafu@cse.cuhk.eu.hk Abstract We consier

More information

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs IEEE TRANSACTIONS ON KNOWLEDE AND DATA ENINEERIN, MANUSCRIPT ID Distribute Line raphs: A Universal Technique for Designing DHTs Base on Arbitrary Regular raphs Yiming Zhang an Ling Liu, Senior Member,

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

6 Gradient Descent. 6.1 Functions

6 Gradient Descent. 6.1 Functions 6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract The Reconstruction of Graphs Dhananay P. Mehenale Sir Parashurambhau College, Tila Roa, Pune-4030, Inia. Abstract In this paper we iscuss reconstruction problems for graphs. We evelop some new ieas lie

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Inuence of Cross-Interferences on Blocked Loops: to know the precise gain brought by blocking. It is even dicult to determine for which problem

Inuence of Cross-Interferences on Blocked Loops: to know the precise gain brought by blocking. It is even dicult to determine for which problem Inuence of Cross-Interferences on Blocke Loops A Case Stuy with Matrix-Vector Multiply CHRISTINE FRICKER INRIA, France an OLIVIER TEMAM an WILLIAM JALBY University of Versailles, France State-of-the art

More information

Optimal Oblivious Path Selection on the Mesh

Optimal Oblivious Path Selection on the Mesh Optimal Oblivious Path Selection on the Mesh Costas Busch Malik Magon-Ismail Jing Xi Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 280, USA {buschc,magon,xij2}@cs.rpi.eu Abstract

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

2-connected graphs with small 2-connected dominating sets

2-connected graphs with small 2-connected dominating sets 2-connecte graphs with small 2-connecte ominating sets Yair Caro, Raphael Yuster 1 Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel Abstract Let G be a 2-connecte graph. A

More information

Performance Modelling of Necklace Hypercubes

Performance Modelling of Necklace Hypercubes erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

Lecture 1 September 4, 2013

Lecture 1 September 4, 2013 CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance New Version of Davies-Boulin Inex for lustering Valiation Base on ylinrical Distance Juan arlos Roas Thomas Faculta e Informática Universia omplutense e Mari Mari, España correoroas@gmail.com Abstract

More information

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM International Journal of Physics an Mathematical Sciences ISSN: 2277-2111 (Online) 2016 Vol. 6 (1) January-March, pp. 24-6/Mao an Shi. THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM Hua Mao

More information

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees Disjoint Multipath Routing in Dual Homing Networks using Colore Trees Preetha Thulasiraman, Srinivasan Ramasubramanian, an Marwan Krunz Department of Electrical an Computer Engineering University of Arizona,

More information

Variable Independence and Resolution Paths for Quantified Boolean Formulas

Variable Independence and Resolution Paths for Quantified Boolean Formulas Variable Inepenence an Resolution Paths for Quantifie Boolean Formulas Allen Van Geler http://www.cse.ucsc.eu/ avg University of California, Santa Cruz Abstract. Variable inepenence in quantifie boolean

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA Implementation an Evaluation of AS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA Kazuya Matsumoto 1, orihisa Fujita 2, Toshihiro Hanawa 3, an Taisuke Boku 1,2 1 Center for Computational

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract Optimization Methos for Calculating Design Imprecision y William S. Law Eri K. Antonsson Engineering Design Research Laboratory Division of Engineering an Applie Science California Institute of Technology

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency Software Reliability Moeling an Cost Estimation Incorporating esting-effort an Efficiency Chin-Yu Huang, Jung-Hua Lo, Sy-Yen Kuo, an Michael R. Lyu -+ Department of Electrical Engineering Computer Science

More information

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks Architecture Design of Mobile Access Coorinate Wireless Sensor Networks Mai Abelhakim 1 Leonar E. Lightfoot Jian Ren 1 Tongtong Li 1 1 Department of Electrical & Computer Engineering, Michigan State University,

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

A Framework for Dialogue Detection in Movies

A Framework for Dialogue Detection in Movies A Framework for Dialogue Detection in Movies Margarita Kotti, Constantine Kotropoulos, Bartosz Ziólko, Ioannis Pitas, an Vassiliki Moschou Department of Informatics, Aristotle University of Thessaloniki

More information

A fast embedded selection approach for color texture classification using degraded LBP

A fast embedded selection approach for color texture classification using degraded LBP A fast embee selection approach for color texture classification using egrae A. Porebski, N. Vanenbroucke an D. Hama Laboratoire LISIC - EA 4491 - Université u Littoral Côte Opale - 50, rue Ferinan Buisson

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico i Torino Porto Institutional Repository [Proceeing] Automatic March tests generation for multi-port SRAMs Original Citation: Benso A., Bosio A., i Carlo S., i Natale G., Prinetto P. (26). Automatic

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

NAND flash memory is widely used as a storage

NAND flash memory is widely used as a storage 1 : Buffer-Aware Garbage Collection for Flash-Base Storage Systems Sungjin Lee, Dongkun Shin Member, IEEE, an Jihong Kim Member, IEEE Abstract NAND flash-base storage evice is becoming a viable storage

More information

Evolutionary Optimisation Methods for Template Based Image Registration

Evolutionary Optimisation Methods for Template Based Image Registration Evolutionary Optimisation Methos for Template Base Image Registration Lukasz A Machowski, Tshilizi Marwala School of Electrical an Information Engineering University of Witwatersran, Johannesburg, South

More information

Impact of cache interferences on usual numerical dense loop. nests. O. Temam C. Fricker W. Jalby. University of Leiden INRIA University of Versailles

Impact of cache interferences on usual numerical dense loop. nests. O. Temam C. Fricker W. Jalby. University of Leiden INRIA University of Versailles Impact of cache interferences on usual numerical ense loop nests O. Temam C. Fricker W. Jalby University of Leien INRIA University of Versailles Niels Bohrweg 1 Domaine e Voluceau MASI 2333 CA Leien 78153

More information

Data Mining: Clustering

Data Mining: Clustering Bi-Clustering COMP 790-90 Seminar Spring 011 Data Mining: Clustering k t 1 K-means clustering minimizes Where ist ( x, c i t i c t ) ist ( x m j 1 ( x ij i, c c t ) tj ) Clustering by Pattern Similarity

More information

WLAN Indoor Positioning Based on Euclidean Distances and Fuzzy Logic

WLAN Indoor Positioning Based on Euclidean Distances and Fuzzy Logic WLAN Inoor Positioning Base on Eucliean Distances an Fuzzy Logic Anreas TEUBER, Bern EISSFELLER Institute of Geoesy an Navigation, University FAF, Munich, Germany, e-mail: (anreas.teuber, bern.eissfeller)@unibw.e

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems an Software 83 (010) 1864 187 Contents lists available at ScienceDirect The Journal of Systems an Software journal homepage: www.elsevier.com/locate/jss Embeing capacity raising

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

Modifying ROC Curves to Incorporate Predicted Probabilities

Modifying ROC Curves to Incorporate Predicted Probabilities Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space A Cost Moel For Nearest Neighbor Search in High-Dimensional Data Space Stefan Berchtol University of Munich Germany berchtol@informatikuni-muenchene Daniel A Keim University of Munich Germany keim@informatikuni-muenchene

More information

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College

More information

PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING

PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING PAMELA RUSSELL ADVISOR: PAUL CULL OREGON STATE UNIVERSITY ABSTRACT. Birchall an Teor prove that

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients Probabilistic Meium Access Control for 1 Full-Duplex Networks with Half-Duplex Clients arxiv:1608.08729v1 [cs.ni] 31 Aug 2016 Shih-Ying Chen, Ting-Feng Huang, Kate Ching-Ju Lin, Member, IEEE, Y.-W. Peter

More information

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks

A Stochastic Process on the Hypercube with Applications to Peer to Peer Networks A Stochastic Process on the Hypercube with Applications to Peer to Peer Networs [Extene Abstract] Micah Aler Department of Computer Science, University of Massachusetts, Amherst, MA 0003 460, USA micah@cs.umass.eu

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra th European Signal Processing Conference (EUSIPCO ) Bucharest, omania, August 7-3, EFFICIENT STEEO MATCHING BASED ON A NEW CONFIDENCE METIC Won-Hee Lee, Yumi Kim, an Jong Beom a Department of Electrical

More information

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Table of contents Faster Visualizations from Data Warehouses 3 The Plan 4 The Criteria 4 Learning

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

Exploring Context with Deep Structured models for Semantic Segmentation

Exploring Context with Deep Structured models for Semantic Segmentation 1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en Hengel, Ian Rei between an image patch an a large backgroun image region. Explicitly moeling

More information

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots Aaptive Loa Balancing base on IP Fast Reroute to Avoi Congestion Hot-spots Masaki Hara an Takuya Yoshihiro Faculty of Systems Engineering, Wakayama University 930 Sakaeani, Wakayama, 640-8510, Japan Email:

More information

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember 107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,

More information

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2.

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2. JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 13/009, ISSN 164-6037 Krzysztof WRÓBEL, Rafał DOROZ * fingerprint, reference point, IPAN99 NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information

Exploring Context with Deep Structured models for Semantic Segmentation

Exploring Context with Deep Structured models for Semantic Segmentation APPEARING IN IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, APRIL 2017. 1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en

More information

Nearest Neighbor Search using Additive Binary Tree

Nearest Neighbor Search using Additive Binary Tree Nearest Neighbor Search using Aitive Binary Tree Sung-Hyuk Cha an Sargur N. Srihari Center of Excellence for Document Analysis an Recognition State University of New York at Buffalo, U. S. A. E-mail: fscha,sriharig@cear.buffalo.eu

More information

Authenticated indexing for outsourced spatial databases

Authenticated indexing for outsourced spatial databases The VLDB Journal (2009) 8:63 648 DOI 0.007/s00778-008-03-2 REGULAR PAPER Authenticate inexing for outsource spatial atabases Yin Yang Stavros Papaopoulos Dimitris Papaias George Kollios Receive: February

More information

Just-In-Time Software Pipelining

Just-In-Time Software Pipelining Just-In-Time Software Pipelining Hongbo Rong Hyunchul Park Youfeng Wu Cheng Wang Programming Systems Lab Intel Labs, Santa Clara What is software pipelining? A loop optimization exposing instruction-level

More information

A Multi-class SVM Classifier Utilizing Binary Decision Tree

A Multi-class SVM Classifier Utilizing Binary Decision Tree Informatica 33 (009) 33-41 33 A Multi-class Classifier Utilizing Binary Decision Tree Gjorgji Mazarov, Dejan Gjorgjevikj an Ivan Chorbev Department of Computer Science an Engineering Faculty of Electrical

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information