k-balanced Sorting and Skew Join in MPI and MapReduce

Size: px
Start display at page:

Download "k-balanced Sorting and Skew Join in MPI and MapReduce"

Transcription

1 k-balance Sorting an Skew Join in MPI an MapReuce Silu Huang, Aa Wai-Chee Fu Department of Computer Science an Engineering, Chinese University of Hong Kong Abstract We consier algorithms for sorting an skew equi-join operations for computer clusters. The propose algorithms achieve the best known theoretical workloa balancing guarantee, an exhibit close to optimal balancing in our experiments. Our empirical stuies also show that the propose sorting algorithm is up to 30% faster than the state-of-the-art algorithm. I. INTRODUCTION A Computer cluster consists of a set of computers or noes connecte to each other by a high spee local area network (LAN). Cluster computing has emerge as a commonly use infra-structure for efficient big ata computation because of the elasticity of the cluster size an low cost CPUs. We consier the esign of parallel algorithms for the basic ata management problems of sorting an join operation on two tables with skew key istributions, with the moels of MPI an MapReuce for computer clusters. One important property of the algorithms is the balancing of the workloa among the machines in the cluster. We eal with algorithms where the runtime for each machine epens in the same way on the sizes of its input or output, which are assume to be big. Thus, we efine the workloa of a machine as the amount of input to or the output from the machine, whichever is bigger. Given t machines, we boun the workloa W i on each machine to within k times that of the bigger of the average input an output sizes. W i k(max(n in,n out )/t) () where N in is the input size, an N out is the output size. For our algorithms, the running time of each machine epens mainly on the workloa. We say that the algorithm is k- balance. We propose sorting an skew join algorithms that are -balance. Our experiments show that they achieve close to perfect workloa istribution in all our test cases. II. SORTING We are given a set S of n objects, where each object is a real number. Our goal is to sort the n objects with a computer cluster. For simplicity the objects themselves are the sort keys. Let there be t machines in the cluster, namely, M,M,...,M t. For simplicity we assume that n is a multiple of t, an let m = n/t. This assumption can be easily remove by paing some ummy objects to S. We assume that initially the n objects are evenly istribute to the t machines, so that each machine is assigne m objects. A. Terasort - a ranomize algorithm Terasort is a parallel algorithm propose to sort ata in the size range of terabytes [9]. There are 3 rouns in Terasort: () a ranom sample set is collecte from the input. () From the sample set, range bounaries are etermine for t contiguous but isjoint ranges that partition the ata accoring to the sort key values. (3) The objects that fall into a particular range are sent to a corresponing machine. Each machine,m i, will then sort the objects receive, S i, so that combining the results of all machines gives a sorte result of the given ataset. An interesting an useful result is erive in [] showing that if the sampling probability ρ is set to /mln(nt), then with high probability, the number of objects istribute to each machine is O(m). From their proof of this result, we can erive that Terasort is 3-balance with high probability. To show that the algorithm is k-balance for a small k with high probability, we woul boun the loa istribution, S i, by km for a small k, with high probability. To this en, we first make some changes to the above in the ranomization step (Roun ). We replace the step of sampling each object by Algorithm S below. Algorithm S always returns exactly ln(nt) objects. Algorithm S [6], [7]: Given objects o,...,o m, initially no object is selecte. Next consier objects one by one from o to o m, when consiering object o k, let j be number of objects alreay selecte, select object o k with probability ( ln(nt) j)/(m k +). From Theorem below, Terasort with Algorithm S is close to 5-balance with high probability. Theorem. Given n 4t as input size for Terasort with Algorithm S, S i 5m+ with probability at least /n. Proofs for the theorems of this paper can be foun in [5]. B. sorting - a eterministic algorithm Our propose parallel algorithm is calle (Sort- Map-Merge Sorting). The iea is that if we evenly ivie the ata into subsets S i, an sort each S i first, we may erive better range bounaries than ranom sampling. We have implemente the algorithm on MPI. Note that if implemente on Haoop, the sorting by Haoop can be turne off in the last merging step. In the first roun, each machine samples s+ objects as follows. The m = n/t objects S i in each machine M i are sorte an ivie into s equi-epth (equi-frequency) intervals. Let the objects receive by M i in sorte orer be o,o,...,o m.m i pickss+ sample objectsλ i,0,λ i,,...,λ i,s, where λ i,0 = o, an λ i,j is the j m/s -th smallest object

2 Sorting - a eterministic algorithm Roun : S is evenly istribute among t machines. Each machine M i hanle a subset S i S, where S i = n/t = m. On each M i, sort subset S i locally an pick λ i,0, λ i,,...,λ i,s an sen to machine M, where λ i,0 is the smallest object in S i an for j > 0, λ i,j is the j m/s -th smallest object in S i. Roun : M receives {λ i,j, i t,0 j s}. M selects global bounary numbers b 0,b,...b t. Each interval [b i,b i+ ) is calle a bucket. The selection is obtaine by Algorithm. b 0,b,...,b t are sent to all machines. Roun 3: Every M i sens the objects in [b k,b k ) from its local storage to M k, for each k t. Every M i merges objects receive in sorte orer. Fig.. sorting Algorithm in S i. Thus, λ i, = o m/s, λ i, = o m/s,..., λ i,s = o m. s+ is the sampling size, an s is a multiple of t. Let s = rt, where r is a small integer. The sample objects are sent to machine M. In Roun, M collects all the sample objects from every machine an then computes t + global key bounaries b 0,b,...,b t, so that each interval [b i,b i+ ) forms a bucket β i+ an the intervals partition the ata set. Each ata objects belongs to one bucket. The algorithm to compute the bounaries will be escribe in the next subsection. The bounaries are sent to all machines. In Roun 3, each machine istributes the sorte ata S i accoring to the bucket bounaries, so that ata belonging to bucket β i go to machine M i. M i merges the ata coming from other machines to form the sorte list for bucket β i. The sorte lists from all machines form the sorte result set. The pseuocoe for is given in Figure. ) Algorithm : computing bucket bounaries: Algorithm is use to compute the global bounary values of b 0,...,b t in Roun of the Algorithm. The input to this algorithm consists of the local bounary values λ i,j from each machine M i. In this computation, we apply linear interpolation for each interval [λ i,j,λ i,j+ ) on each M i. Since there are m/s objects in the interval by construction, we compute the mean value µ i,j = (m/s)/(λ i,j+ λ i,j ) for Algorithm. We also set µ i,s = 0, i t. Each interval [b i,b i+ ), 0 i < t is calle a bucket. We use the term bucket ensity for the number of objects in a bucket, enote by D[b k,b k+ ), 0 k < t. Note that b k is not necessarily an input object, where 0 k t. The selection ensures that the estimate bucket ensity base on µ i,j, i t, j s, for D[b k,b k+ ) is equal to m, where k < t. A priority queue Q is maintaine for storing triplets of the form λ, i, µ, which are sorte by the first value λ as the key. In the triplet λ,i,µ, λ an µ correspon to a certain pair of (λ i,j, µ i,j ) values from M i. Variable cur keeps count for the estimate ensity of the current bucket until it reaches m, in which case, a new bounary b[k] is etermine. The value of cf is for the combine frequency istribution over the value omain. An example is shown in Figure. Here we have machines, given 40 objects, each machine samples 3 objects, namely, (,3,4) an (,6,7), respectively, from the istribute objects. Algorithm processes the λ values in the orer of,, 3, 4, 6, 7. When 4 is processe, the cur value excees m = 0, the value 3.5 is compute as the bucket bounary at Line 9. Algorithm : Computing Bucket Bounaries Input : λ i,j,µ i,j, i t,0 j s Output : Global bounaries b[k], 0 k t Initialize: Create an empty priority queue Q; i t: pastf[i] = 0; next[i] = 0; push λ i,0,i,µ i,0 into priority queue Q; cf = 0;pre = 0;cur = 0;k = 0;flag = 0; while Q o 3 λ,i,µ TopAnPop(Q); /*λ an µ from M i */ 4 if flag == 0 then 5 b[k] = λ,k ++,flag = ; /*first bounary*/ 6 if (λ pre) cf+cur < m then 7 cur+ = (λ pre) cf; /*keep count*/ 8 else 9 b[k] = (m cur)/cf+pre,k++; /* new bucket */ 0 cur = (λ pre) cf+cur m; /* keep count for new bucket*/ pre = λ; /*upate previous bounary*/ cf = cf pastf[i]+µ; /* upate cf */ 3 pastf[i] = µ; /*µ will be obsolete for M i*/ 4 if!next[i] then 5 push λ i,next[i],i,µ i,next[i] into Q,next[i]++; 6 b t = λ return b[k],0 k t; cf 0 bounary Fig.. Example of Combine Frequency Distribution (cf): n = 40, t =, s =, m = 0, samples from M = (,3,4), samples from M = (,6,7) Each while loop hanles one sample value λ ij. There are at most t elements in Q, hence each while loop costs O(logt) time. The total time complexity is O(st log t) because of t(s+ ) rouns of the while loop. We shoul point out that the complexity of Algorithm is insignificant compare to the problem size. Utilization of computer cluster is justifie only when the problem size is big, an from previous works such as [], the size is in terms of billions of recors an is 0 GB or more. Thus, the value of t is very small in comparison to n. In our experiments, the runtime for Roun, incluing Algorithm, is foun to be

3 negligible for all test cases. ) Analysis: In the first roun of, all machines are assigne equal workloa. In Roun, the workloa is the t(s + ) samples which is small compare to the input size, n. Hence, we nee only analyze the workloa istribution at Roun 3. We aim for a boun on the maximum workloa of a machine when compare to the even workloa. Theorem. At Roun 3 of sorting, the workloa of each machine is boune by (+/r+t /n)m. For example, if n 5M, r =, an t = 50, then the workloa for each machine is boune above by m, an it is -balance. If n 75M, r = 6, an t = 50, then this boun becomes.3m, an is.3-balance. Two k-balance algorithms are comparable when they have similar operations at each machine. an Terasort both involve only sorting an istribution of ata as the major steps. In comparison, Terasort has up to 60% imbalance empirically, which is higher than the above boun for. The global bounaries are relate to quantiles of an orere sequence of ata values. The φ-quantile is the element α with rank φn, wheren is the given number of values [], [4]. We aopt a two phase approach as in []. However, our problem is for bounaries instea of ranks, which allows us to apply linear interpolation in the computation. III. SKEW JOIN In the recent evelopment of the Apache Pig system on top of MapReuce, it has been note that ata skew in join is a significant an challenging problem [3]. In this section, we focus on the problem of join when ata is skew. We consier the problem of skew join for two tables S an T with an equality join conition of S.ρ = T.ρ for a certain join key ρ. As in [8], we moel the join result by means of a S T join-matrix Γ as shown in Figure 3(b). In this matrix, S an T are sorte by the join key into orere lists S = s,s,...,s S, an T = t,t,...,t T. In Figure 3(b), the key values for s,...,s S are b,,,,,f, corresponingly. The matrix entry Γ(i,j) is true (shae) iff s i.ρ = t j.ρ. The join result for a certain join key k form a shae rectangular region in Γ, we call this region the join result for k, or simply result(k). For example, in Figure 3(b), the join result for key, enote by result(), is the shae rectangle of size 4 3. Suppose k is a join key, we say that the size of the join result for k is M N if M an N are the number of tuples with key k from S an T, respectively. For example, in Figure 3(b), the join result for key has size 4 3, which is the cross prouct of tuples to 5 from S an to 4 from T. Next we efine the skew factor to inicate how large the join result size is compare with the total size of S an T, where size is measure by the number of tuples. Definition (Join Skew Factor σ). The skew factor of the join, S T, of two tables S an T is given by σ if S T = σ( S + T ). A. - A Deterministic Algorithm In this section we introuce a eterministic algorithm for hanling the skew join problem The major iea for is the partitioning of ata base on statistical information. ) Statistics Collection: In Algorithm, we first collect statistics from the two tabless ant. For this purpose, we apply a parallel sorting algorithm such as Terasort or for each of S an T, allowing for repeate keys. After sorting, each M i contains sorte portions or buckets Pi S an Pi T of S an T, respectively. All occurrences of the same join key will be collecte at one single machine. Then each machine calculates the sizes of the join results for ifferent join keys, an the total join result size that will be generate from Pi S an Pi T. The result sizes are measure in number of tuples. Base on such statistics, a task istribution algorithm is applie on all the join tasks. Let W be the total join result size. A join result of a key with a size greater than W/t is calle a big join result, otherwise, it is calle a small join result. Note that the biggest size of a small join result is W/t. We ecie on the task istribution by first consiering the big join results, followe by the consieration of the small join results. Although the statistics collection requires a sorting of the input atasets, this overhea is insignificant when compare to the overall runtime, because for skew join the input size is small when compare to the result size. ) Big Join Results: We consier the big join results one at a time, in an arbitrary orer. Let B be a big join result with a size of M N, where (j )W/t < MN jw/t. We apply a result-to-machine mapping metho for B with the number of machines set to j. Without loss of generality, let the machines assigne be M,...,M j. The result of the mapping is that each machine M i will be mappe to a rectangular region in the join result B. Each rectangular region is efine by a quaruple ls i,hi s,li t,hi t, whereli s,hi s are two tuple i s in table S, where ls i < h i s, an lt,h i i t are two tuple i s in table T, where lt i < h i t. A tuple in table S with i in [ls,h i i s] is assigne to M i. Similarly, a tuple in T with i in [lt i,hi t ] is assigne to M i. For example, in Figure 3 (b), suppose we ivie the join result horizontally into equal size rectangles. The top rectangle is efine by,3,,4. Suppose this rectangle is assigne to machinem. Then tuples an 3 of S, an tuples, 3, an 4 of T will be assigne to M. We ivie the M N result tuples among j machines by partitioning the longer sie of the rectangle B into j intervals as evenly as possible. Without loss of generality, assume M N. Then M is ivie into j intervals. Each of the j intervals an the sie of size N of region B form a rectangle in B. HenceB is partitione intoj such rectangles. We call these rectangles the mapping rectangles. There are two possible cases for the size of MN: ) MN = jw/t. In this case, the j mapping rectangles are of the same size, W/t. The output of each mapping rectangle are assigne to one of j machines that have

4 S T 3 4 Interval Interval (a) Interval Interval S T a e g (b) b f Fig. 3. (a) machine matrix A for 4 machines (t = 4), a = b =. (b) join matrix Γ an ranomize tuple-to-interval mapping not been assigne any big join result so far. We sen the N tuples on the T sie of B an tuples along interval i, i j, on the S sie of B, to M i. ) MN < jw/t. Since we partition the longer sie of B (with M tuples) as even as possible, each interval has either M/j or M/j tuples. Thus, the smallest mapping rectangle R min has a size smaller than W/t. For each of the j mapping rectangles other than R min, the corresponing tuples are processe as in Case () above, so that their output are assigne to j machines. For R min, it is treate as a small join result, which is to be processe as escribe in the next subsection. We call R min a resiual join result. 3) Small Join Results: After the big join results are assigne to the machines, we eal with the result-to-machine mapping for the small join results. The small join results inclue those resiual join results. We consier small join results for ifferent join keys one by one, each time we assign the next join result to the machine with a smallest assigne workloa. The work assignment resembles the greey bin-packing algorithm an hence we have the following result. Theorem 3. Let the total join results size bew. With, the total size of the join results generate by any machine is at most W/t. B. RanJoin- A Ranomize Algorithm In this subsection, we introuce our ranomize algorithm, RanJoin, for hanling skew join. In this algorithm we assign tuples to machines in a ranomize approach. A ranomize algorithm is propose in [8] which maps square regions of the join result to machines. However, since the join result may not be a perfect square, the mapping oes not ensure equal probabilities of assignments, an results in a maximum workloa imbalance factor of 4. Here we propose another mapping technique where each tuple has equal expecte probabilities for the assignments. Our metho is base on a conceptual machine matrix. ) Machine Matrix A: Let the number of machines be t, we etermine two integers a an b such that firstly, a b = t an seconly, among all a, b satisfying a b = t, a T +b S is minimize. We shall see that a b = t is a sufficient conition for our workloa balancing guarantee. The minimization of a T + b S can lea to some minor improvement for loa balancing relate to the join input size to each reucer. The reason for this choice will be explaine later. With the values of a an b, we form a a b matrix A calle the machine matrix. For matrix A, we call the first imension S an the secon imension T. Each A[i,j] is assigne a unique machine. We say that A[i, j] lies on interval i of S an interval j of T. Example. Fig.3(a) shows the machine matrix A given 4 machines. The two imensions of S an T each consists of intervals, i.e., a = b =. Machines M, M, M 3, an M 4 are assigne to A[,], A[,], A[,], an A[,], respectively. ) Tuple-to-Interval Mapping: We assign tuples to machines by a ranomize algorithm. For each tuple in S we ranomly select an integer i in,...,a an map the tuple to interval i of S in the machine matrix A. For each tuple in T, We ranomly select an integer j in,...,b an map the tuple to interval j of T in A. Then each tuple is assigne to the machines as follows: if an S tuple x is mappe to interval i of S in matrix A, then x is sent to each of the b machines assigne to A[i,], A[i,],..., A[i,b]. If a T tuple y is assigne to interval j of T in A, then y is sent to each of the a machines assigne to A[,j], A[,j],..., A[a,j]. Each machine computes the cross-prouct of all the S tuples an T tuples that it has receive for a single join key. Hence, the join result for tuples x an y, if any, will be uniquely generate by the machine assigne to A[i,j]. Example. : In Figure 3(b), we show the join matrix for the tables S an T. Each table contains 6 tuples. We show that tuples,3,4,5 of S are ranomly assigne interval numbers,,,. Then the secon tuple of S will be mappe to the first interval on S in matrix A in Figure 3(a), an it will be sent to machines M an M. The join result in the join matrix for the arker shae area will be generate by machine M. From the above tuple-to-interval mapping, each tuple in S is assigne to b machines, an each tuple in T is assigne to a machines. By selecting a an b that minimize a T +b S we minimize the total input size to the machines in the number of tuples. The following theorem establishes that RanJoin is - balance with high probability. Theorem 4. If the join results for each join key is either an empty set or a set with size M N where M/a 300 an N/b 300, then the probability that the workloa of any machine is less than twice the even workloa is more than IV. EXPERIMENTAL RESULTS Our experiments for the parallel algorithms have been conucte on a 6 machine cluster with a master machine an 5 slave machines. The master is a Dell R70 Server with

5 (a) workloa (input size) (b) run time Fig. 4. Sorting for real ataset LIDAR, = maximum workloa / optimal workloa Dual 6-core Xeon E60.0GHz, 9GB RAM an 4x 3TB SAS Har Disk. Each slave machine is a Dell R60 Server - Dual 6-core Xeon E60.0GHz, with 48GB RAM an x 300GB SAS Har Disk. All machines are connecte by a GBethernet switch. We have installe Haoop (version..) on the cluster for MapReuce algorithms. There are 6xx5 = 80 cores in the slaves, we can activate up to 80 workers in parallel for Haoop mappers or reucers. We have implemente the sorting algorithms ( an Terasort) base on MPI, an the join algorithms RanJoin an base on Haoop MapReuce. We have set the DFS fs.replication factor to 3. The fs.block.size is set to 64MB. Other Haoop parameters are set to the efault values. The computer cluster consists of 5 worker machines each with 8 cores that share har isks. We shall call the parallel computational units processes instea of machines in our experiments. We evaluate our algorithms by two measurements: the workloa istribution an the runtime. The workloa is measure by the input size for sorting an by the result size for join. The sizes are given in the number of tuples unless otherwise specifie. We examine the which is given by the ratio of the maximum workloa on a machine versus the even workloa. The runtime is given by the longest runtime taken by any process. A. Results for Sorting We evaluate the sorting algorithms of an Terasort on a real ataset LIDAR an also on a synthetic ataset. For, we set the value of r to so that each process samples t objects. We vary the from 5 to 80, an measure both the workloa istribution an the runtime performance. Real Data: We use the real ataset LIDAR for experiments on sorting, which has been use for the sorting experiments in []. LIDAR contains 8.7 billion recors, each of which is a 3D point representing a location in North Carolina. We sort the recors by the first imension. The ataset size is 3GB. The input ata is istribute sequentially to the machines. Synthetic Data : We have generate 4 sets of ranom ata, with.8 billion objects, 5.4 billion objects, 9 billion objects an 8 billion objects. The sizes of these atasets are 9.9 GB, Downloaable from (a) workloa (input size) (b) run time (sec) Fig. 5. Comparing an Terasort for Ranom Dataset with 8 billion objects (99.3 GB) ataset size(billion) (a) workloa (input size) ataset size(billion) (b) run time (sec) Fig. 6. Sorting results for Ranom Datasets of ifferent sizes with 0 processes 59.9 GB, 99.8 GB an 99.3 GB, respectively. The key of each ata object in a ataset is a ranomly generate number in the range of [, 0 6 ]. We generate unique objects in each machine. The results of are shown in Figures 4(a), 5(a), an 6(a). In all cases, istributes the workloa very evenly an the imbalance is close to the optimal value of. has comparably much larger workloa imbalance, in most cases the maximum workloa of a process is above.5 of the optimal loa. Similar results are reporte in []. The imbalance affects the performance in runtime. However, the effect is mitigate by the fact that we have a star cluster with a bottleneck at the master noe, as shown in Figures 4(b), 5(b), an 6(b). The improvement in runtime by is expecte to be more significant in a ecentralize setting. The figures also show that achieves almost linear speeup. B. Results on Skew Join For the Skew Join experiments the ataset consists of two input tables S an T. We aopt two ifferent methos to form a ataset with skew join keys. The first metho is to generate tables with attributes rawn from the Zipf istribution, maintaining the same istribution for both tables so that each key has the same frequency in both of the input tables. We vary the Zipf skew parameter θ between 0 (skew) an (uniform), i.e., Z(r) /r ( θ), where r is a frequency rank, Z(r) is the frequency of the item with rank r. The secon kin of skew ata is generate as escribe in []. For a table with n tuples, the join key has a omain of [n,n). The special join key n appears in a fixe number of tuples, while the remaining tuples are ranomly assigne a

6 RanJoin RanJoin (a) θ = 0.7, S = T =5M (b) θ = 0.3, S = T =.5M output size = 47GB output size = 59GB ( tuples, σ=940) ( tuples, σ=3900) Fig. 7. Workloa istribution of RanJoin an for Zipf istributions: (θ = : uniform key istribution) θ= θ=0.7 θ=0.3 θ= (a) RanJoin Fig θ= θ=0.7 θ=0.3 θ= (b) Running time for Zipf skew atasets (in sec) RanJoin.. RanJoin (a) M = 0 5, N = 0 4 (b) M = 0 5 N = 0 4 Fig. 9. Workloa istribution for scalar skew ata. join key from [n,n). The output tuple size is 95 bytes. The skew key k 0 = n is generate in both tables S an T, an it occurs M times in S an N times in T. By ajusting M an N we can control the expecte output join sizes. This kin of test ata is calle scalar skew in [] an is also use in the stuy in [0]. Zipf istribute ataset: We aim to compare the effect of skewness on similar join output size. However, Zipf istributions woul vary the output size for the same input size. Therefore we vary the input table sizes accoringly. Following the esign of [8] for skew key istribution, each tuple contains a 4 byte join key with a omain of [000,999]. Scalar skew ataset : We teste on two sets of scalar skew ata. As in [], we fix an output size an vary the values of M an N to examine the effect of ifferent key skewness in the two given tables. For the first ataset, we set M = 0 5, an N = 0 4. For the secon set, we set M = 0 5 an N = 0 4. The output size of the join of S an T for both atasets is 90GB. In both atasets, S = T =.5M, an the skew factor σ is 600. ) Runtime Analysis: The total runtimes for Zipf skew ata are shown in Figure 8. The results for scalar skew ata are similar. It can be seen that there is almost linear speeup up to 5 processes. ue to the highly even workloa istribution. The speeup effect beyon 5 processes is iscounte by the overhea in the file replication of Haoop HDFS. The file replication factor is 3, hence, with 34 har isks, there is goo speeup effects with up to 5 processes. ) Workloa Imbalance: The results of workloa istribution are shown in Figures 7 an 9. For the scalar skew ataset, RanJoin i not istribute the workloa as evenly when the number of processors is large, this is because the values of M/a an N/b are too small to satisfy the conition in Theorem 4. achieves near optimal results in all cases. V. CONCLUSION We stuy the problems of sorting an skew join in a computer cluster environment. We propose algorithms that achieve the best known theoretical guarantees on even workloa istribution. Extensive empirical stuy shows that our sorting algorithm performs better than the state-of-the-art metho of. All our algorithms achieve near optimal workloa istribution in all test cases. ACKNOWEDGEMENTS This research was supporte by GRF CUHK433 an Direct grant The authors woul like to thank the reviewers for helpful comments, James Cheng for the use of the computer cluster, Yanyan Xu, Yi Lu, Wenqing Lin an Yingyi Bu for sharing their experiences with MPI an Haoop, the authors of [] for their source coe an a pointer to a ataset, an the authors of [8] for explaining their source coe. REFERENCES [] K. Alsabti, S. Ranka, an V. Singh. A one-pass algorithm for accurately estimating quantiles for isk-resient ata. In VLDB, 997. [] D. J. DeWitt, J. F. Naughton, D. A. Schneier, an S. Seshari. Practical skew hanling in parallel joins. In VLDB, 99. [3] A. F. Gates, O. Natkovich, S. Chopra, P. Kamath, S. M. Narayanamurthy, C. Olston, B. Ree, S. Srinivasan, an U. Srivastava. Builing a highlevel ataflow system on top of map-reuce: The pig experience. In VLDB, 009. [4] M. Greenwal an S. Khanna. Space-efficient online computation of quantile summaries. In SIGMOD, 00. [5] S. Huang an A. W.-C. Fu. (a, k)-minimal sorting an skew join in mpi an mapreuce. In CoRR (arxiv), 04. [6] T. Jones. A note on sampling a tape file. Commun. ACM, 5(6):343, 96. [7] D. E. Knuth. The Art of Computer Programming, Volume Seminumerical Algorithms 3r E. Aison Wesley, 997. [8] A. Okcan an M. Rieewai. Processing theta-joins using mapreuce. In SIGMOD, 0. [9] O. O Malley. Terabyte sort on apache haoop. In Technical Report, Yahoo, 008. [0] E. Omiecinski. Performance analysis of a local balancing hash-join algorithm for a share memory multiprocessor. In VLDB, 99. [] Y. Tao, W. Lin, an X. Xiao. Minimal mapreuce algorithms. In SIGMOD, 03. [] C. Walton, A. Dale, an R. Jenevein. A taxonomy an performance moel of ata skew effects in parallel joins. In VLDB, 99.

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Lecture 1 September 4, 2013

Lecture 1 September 4, 2013 CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the

More information

Overlap Interval Partition Join

Overlap Interval Partition Join Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs IEEE TRANSACTIONS ON KNOWLEDE AND DATA ENINEERIN, MANUSCRIPT ID Distribute Line raphs: A Universal Technique for Designing DHTs Base on Arbitrary Regular raphs Yiming Zhang an Ling Liu, Senior Member,

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 01 01 01 01 01 00 01 01 Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Design of Policy-Aware Differentially Private Algorithms

Design of Policy-Aware Differentially Private Algorithms Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques Politehnica University of Timisoara Mobile Computing, Sensors Network an Embee Systems Laboratory ing Techniques What is testing? ing is the process of emonstrating that errors are not present. The purpose

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

Optimal Oblivious Path Selection on the Mesh

Optimal Oblivious Path Selection on the Mesh Optimal Oblivious Path Selection on the Mesh Costas Busch Malik Magon-Ismail Jing Xi Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 280, USA {buschc,magon,xij2}@cs.rpi.eu Abstract

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Aitional Divie an Conquer Algorithms Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Divie an Conquer Closest Pair Let s revisit the closest pair problem. Last

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications

More information

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 1, NO. 4, APRIL 01 74 Towar Efficient Distribute Algorithms for In-Network Binary Operator Tree Placement in Wireless Sensor Networks Zongqing Lu,

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients Probabilistic Meium Access Control for 1 Full-Duplex Networks with Half-Duplex Clients arxiv:1608.08729v1 [cs.ni] 31 Aug 2016 Shih-Ying Chen, Ting-Feng Huang, Kate Ching-Ju Lin, Member, IEEE, Y.-W. Peter

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

k-nn Graph Construction: a Generic Online Approach

k-nn Graph Construction: a Generic Online Approach k-nn Graph Construction: a Generic Online Approach Wan-Lei Zhao arxiv:80.00v [cs.ir] Sep 08 Abstract Nearest neighbor search an k-nearest neighbor graph construction are two funamental issues arise from

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stolyar Abstract Backpressure-base aaptive routing

More information

arxiv: v1 [math.co] 15 Dec 2017

arxiv: v1 [math.co] 15 Dec 2017 Rectilinear Crossings in Complete Balance -Partite -Uniform Hypergraphs Rahul Gangopahyay Saswata Shannigrahi arxiv:171.05539v1 [math.co] 15 Dec 017 December 18, 017 Abstract In this paper, we stuy the

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stoylar arxiv:15.4984v1 [cs.ni] 27 May 21 Abstract

More information

Principles of B-trees

Principles of B-trees CSE465, Fall 2009 February 25 1 Principles of B-trees Noes an binary search Anoe u has size size(u), keys k 1,..., k size(u) 1 chilren c 1,...,c size(u). Binary search property: for i = 1,..., size(u)

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Authenticated indexing for outsourced spatial databases

Authenticated indexing for outsourced spatial databases The VLDB Journal (2009) 8:63 648 DOI 0.007/s00778-008-03-2 REGULAR PAPER Authenticate inexing for outsource spatial atabases Yin Yang Stavros Papaopoulos Dimitris Papaias George Kollios Receive: February

More information

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Image compression predicated on recurrent iterated function systems

Image compression predicated on recurrent iterated function systems 2n International Conference on Mathematics & Statistics 16-19 June, 2008, Athens, Greece Image compression preicate on recurrent iterate function systems Chol-Hui Yun *, Metzler W. a an Barski M. a * Faculty

More information

Supporting Fully Adaptive Routing in InfiniBand Networks

Supporting Fully Adaptive Routing in InfiniBand Networks XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 1 Supporting Fully Aaptive Routing in InfiniBan Networks J.C. Martínez, J. Flich, A. Robles, P. López an J. Duato Resumen InfiniBan is a new stanar

More information

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

d 3 d 4 d d d d d d d d d d d 1 d d d d d d Proceeings of the IASTED International Conference Software Engineering an Applications (SEA') October 6-, 1, Scottsale, Arizona, USA AN OBJECT-ORIENTED APPROACH FOR MANAGING A NETWORK OF DATABASES Shu-Ching

More information

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Questions? Post on piazza, or  Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)! EE122 Fall 2013 HW3 Instructions Recor your answers in a file calle hw3.pf. Make sure to write your name an SID at the top of your assignment. For each problem, clearly inicate your final answer, bol an

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

Analysis of half-space range search using the k-d search skip list. Here we analyse the expected time for half-space

Analysis of half-space range search using the k-d search skip list. Here we analyse the expected time for half-space Analysis of half-space range search using the k- search skip list Mario A. Lopez Brafor G. Nickerson y 1 Abstract We analyse the average cost of half-space range reporting for the k- search skip list.

More information

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM International Journal of Physics an Mathematical Sciences ISSN: 2277-2111 (Online) 2016 Vol. 6 (1) January-March, pp. 24-6/Mao an Shi. THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM Hua Mao

More information

Characterizing Decoding Robustness under Parametric Channel Uncertainty

Characterizing Decoding Robustness under Parametric Channel Uncertainty Characterizing Decoing Robustness uner Parametric Channel Uncertainty Jay D. Wierer, Wahee U. Bajwa, Nigel Boston, an Robert D. Nowak Abstract This paper characterizes the robustness of ecoing uner parametric

More information

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College

More information

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract The Reconstruction of Graphs Dhananay P. Mehenale Sir Parashurambhau College, Tila Roa, Pune-4030, Inia. Abstract In this paper we iscuss reconstruction problems for graphs. We evelop some new ieas lie

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 0 0 0 0 0 0 0 0 on-uniform Sensor Deployment in Mobile Wireless Sensor etworks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University Boca Raton,

More information

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11:

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11: airwise alignment using shortest path algorithms, Gunnar Klau, November 9,, : 3 3 airwise alignment using shortest path algorithms e will iscuss: it graph Dijkstra s algorithm algorithm (GDU) 3. References

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

Data Mining: Clustering

Data Mining: Clustering Bi-Clustering COMP 790-90 Seminar Spring 011 Data Mining: Clustering k t 1 K-means clustering minimizes Where ist ( x, c i t i c t ) ist ( x m j 1 ( x ij i, c c t ) tj ) Clustering by Pattern Similarity

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle

More information

Data Mining: Concepts and Techniques. Chapter 7. Cluster Analysis. Examples of Clustering Applications. What is Cluster Analysis?

Data Mining: Concepts and Techniques. Chapter 7. Cluster Analysis. Examples of Clustering Applications. What is Cluster Analysis? Data Mining: Concepts an Techniques Chapter Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.eu/~hanj Jiawei Han an Micheline Kamber, All rights reserve

More information

Latency-Sensitive Data Allocation for Cloud Storage

Latency-Sensitive Data Allocation for Cloud Storage Latency-Sensitive Data Allocation for Clou Storage Song Yang 1, Philipp Wieer 1, Muzzamil Aziz 1, Ramin Yahyapour 1, 2 an Xiaoming Fu 2 1 Gesellschaft für wissenschaftliche Datenverarbeitung mbh Göttingen

More information

Multi-camera tracking algorithm study based on information fusion

Multi-camera tracking algorithm study based on information fusion International Conference on Avance Electronic Science an Technolog (AEST 016) Multi-camera tracking algorithm stu base on information fusion a Guoqiang Wang, Shangfu Li an Xue Wen School of Electronic

More information

Distributed Planning in Hierarchical Factored MDPs

Distributed Planning in Hierarchical Factored MDPs Distribute Planning in Hierarchical Factore MDPs Carlos Guestrin Computer Science Dept Stanfor University guestrin@cs.stanfor.eu Geoffrey Goron Computer Science Dept Carnegie Mellon University ggoron@cs.cmu.eu

More information

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,

More information

Just-In-Time Software Pipelining

Just-In-Time Software Pipelining Just-In-Time Software Pipelining Hongbo Rong Hyunchul Park Youfeng Wu Cheng Wang Programming Systems Lab Intel Labs, Santa Clara What is software pipelining? A loop optimization exposing instruction-level

More information

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD Warsaw University of Technology Faculty of Physics Physics Laboratory I P Joanna Konwerska-Hrabowska 6 FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD.

More information

NAND flash memory is widely used as a storage

NAND flash memory is widely used as a storage 1 : Buffer-Aware Garbage Collection for Flash-Base Storage Systems Sungjin Lee, Dongkun Shin Member, IEEE, an Jihong Kim Member, IEEE Abstract NAND flash-base storage evice is becoming a viable storage

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

Provisioning Virtualized Cloud Services in IP/MPLS-over-EON Networks

Provisioning Virtualized Cloud Services in IP/MPLS-over-EON Networks Provisioning Virtualize Clou Services in IP/MPLS-over-EON Networks Pan Yi an Byrav Ramamurthy Department of Computer Science an Engineering, University of Nebraska-Lincoln Lincoln, Nebraska 68588 USA Email:

More information

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember 107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,

More information

A Highly Scalable Parallel Boundary Element Method for Capacitance Extraction

A Highly Scalable Parallel Boundary Element Method for Capacitance Extraction A Highly Scalable Parallel Bounary Element Metho for Capacitance Extraction The MIT Faculty has mae this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

arxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT

arxiv: v1 [cs.cr] 22 Apr 2015 ABSTRACT Differentially Private -Means Clustering arxiv:10.0v1 [cs.cr] Apr 01 ABSTRACT Dong Su #, Jianneng Cao, Ninghui Li #, Elisa Bertino #, Hongxia Jin There are two broa approaches for ifferentially private

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

A Framework for Dialogue Detection in Movies

A Framework for Dialogue Detection in Movies A Framework for Dialogue Detection in Movies Margarita Kotti, Constantine Kotropoulos, Bartosz Ziólko, Ioannis Pitas, an Vassiliki Moschou Department of Informatics, Aristotle University of Thessaloniki

More information

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space A Cost Moel For Nearest Neighbor Search in High-Dimensional Data Space Stefan Berchtol University of Munich Germany berchtol@informatikuni-muenchene Daniel A Keim University of Munich Germany keim@informatikuni-muenchene

More information

Solutions to Tutorial 1 (Week 8)

Solutions to Tutorial 1 (Week 8) The University of Syney School of Mathematics an Statistics Solutions to Tutorial 1 (Week 8) MATH2069/2969: Discrete Mathematics an Graph Theory Semester 1, 2018 1. In each part, etermine whether the two

More information

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance New Version of Davies-Boulin Inex for lustering Valiation Base on ylinrical Distance Juan arlos Roas Thomas Faculta e Informática Universia omplutense e Mari Mari, España correoroas@gmail.com Abstract

More information

Analytical approximation of transient joint queue-length distributions of a finite capacity queueing network

Analytical approximation of transient joint queue-length distributions of a finite capacity queueing network Analytical approximation of transient joint queue-length istributions of a finite capacity queueing network Gunnar Flötterö Carolina Osorio March 29, 2013 Problem statements This work contributes to the

More information

Secure Network Coding for Distributed Secret Sharing with Low Communication Cost

Secure Network Coding for Distributed Secret Sharing with Low Communication Cost Secure Network Coing for Distribute Secret Sharing with Low Communication Cost Nihar B. Shah, K. V. Rashmi an Kannan Ramchanran, Fellow, IEEE Abstract Shamir s (n,k) threshol secret sharing is an important

More information

Uninformed search methods

Uninformed search methods CS 1571 Introuction to AI Lecture 4 Uninforme search methos Milos Hauskrecht milos@cs.pitt.eu 539 Sennott Square Announcements Homework assignment 1 is out Due on Thursay, September 11, 014 before the

More information