Debiasing Crowdsourced Batches

Size: px
Start display at page:

Download "Debiasing Crowdsourced Batches"

Transcription

1 Debiasing Crowdsourced Batches Honglei Zhuang, Aditya Paraeswaran, Dan Roth and Jiawei Han Departent of Coputer Science University of Illinois at Urbana-Chapaign {hzhuang3, adityagp, danr, ABSTRACT Crowdsourcing is the de-facto standard for gathering annotated data. While, in theory, data annotation tasks are assued to be attepted by workers independently, in practice, data annotation tasks are often grouped into batches to be presented and annotated by workers together, in order to save on the tie or cost overhead of providing instructions or necessary background. Thus, even though independence is usually assued between annotations on data ites within the sae batch, in ost cases, a worker s judgent on a data ite can still be affected by other data ites within the batch, leading to additional errors in collected labels. In this paper, we study the data annotation bias when data ites are presented as batches to be judged by workers siultaneously. We propose a novel worker odel to characterize the annotating behavior on sall data batches, and present how to train the worker odel on annotation data sets. We also present a debiasing technique to reove the effect of such annotation bias fro adversely affecting the accuracy of labels obtained. Our experiental results on both synthetic data and realworld data deonstrate the effectiveness of our proposed ethod. 1. INTRODUCTION Crowdsourcing provides an efficient ethod to annotate data on a large scale for various achine learning tasks by eploying a assive workforce drawn fro global Internet users. Popular online crowdsourcing platfors include Aazon Mechanical Turk 1 and CrowdFlower 2. However, while crowdsourcing is relatively cheap copared to eploying experts, getting large quantities of training data annotated by crowds (say thousands, or illions of data ites) can be rather expensive. A key echanis, often eployed in practice for reducing costs, is batching, i.e., grouping ultiple data ites (to be annotated together) into one single task as a batch. Batching can save significant onetary costs, since the necessary instructions and background for copleting the task needs to be provided just once for the entire batch. Thus, the worker will spend less tie on reviewing these Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX...$ instructions, and ore tie on annotating data ites, and therefore will be able to annotate ore data ites within the sae tie. For instance, consider a scenario where a worker has to judge whether a coent is relevant to a docuent. Here, aking a judgent for each coent requires reading through the entire docuent. Instead, with batching, the worker only needs to read the entire docuent once, and then ake a judgent for all the coents in the batch. In fact, even fro the workers point of view, it is also ore attractive to label batches of data ites as they can save tie on switching between different tasks. Furtherore, once they start on a batch, they no longer have to fight for other high paying or attractive tasks. However, even though batching is an attractive option in practice due to its cost and tie savings, having workers annotate batches can lead to severe correlation between annotations within batches. For exaple, say we have a task of annotating whether a review of the ovie The Iitation Gae crawled fro IMDb is positive. As illustrated in Figure 1(a), if we only show one review to be judged as part of each crowdsourcing unit task, workers will have to spend soe tie reading the instructions and possibly looking up the ovie before they can ake a single judgent on a review. Although judgents are likely to be independent, this way of assigning work is too costly to be practical. Instead, if we asseble ultiple reviews of the sae ovie into a batch, as shown in Figure 1(b), workers can ake ultiple judgents after they read the instructions. Nevertheless, in this case, the annotation of different reviews ight interfere with each other. For exaple, when the review Average In The Extree does not see like a positive review per se (Cf. top right in Figure 1(a)), when grouped with the review Stack of Lies, it looks uch ore like a positive review (Cf. top in Figure 1(b)). Siilarly, when the review Good enough but historically sketchy looks quite positive by itself (Cf. botto left in Figure 1(a)), it does not look as positive as a strongly effusive review siply saying Great ovie, as shown in the botto of Figure 1(b). Thus, overall these effects ight be undesirable and isleading as it is inconsistent with the case when workers ake independent judgents. Therefore, it is challenging to ascertain true labels of data ites in batches. So far, there has been little to no work in exploring the the possible annotation error introduced by grouping data ites into batches. Although batching data ites has been adopted in any crowdsourced tasks such as sorting [17, object recognition [32 or clustering [10, and anecdotally very widely used in practice, the assuption is often that the annotations are collected independently, which is not the case. While there is liited work on judging data ites in sequence [18, 26, 27, it is not directly applicable to our setting where a batch of data ites are presented and annotated in parallel. Our previous research [36 also noticed this specific

2 the ovie The Iitation Gae the ovie The Iitation Gae the ovie The Iitation Gae ý Stack of Lies ý Average In The Extree ý Stack of Lies þ Average In The Extree the ovie The Iitation Gae the ovie The Iitation Gae the ovie The Iitation Gae þ Good enough but historically sketchy þ Great ovie, worth a watch ý Good enough but historically sketchy þ Great ovie, worth a watch (a) Data ites judged independently (b) Data ites judged in a batch Figure 1: Exaple of correlation between annotations on data ites in the sae batch. Workers are asked to label whether a review on the ovie The Iitation Gae crawled fro IMDb is positive. Assign each review-ovie pair to workers separately can be costly, while assigning a batch of reviews together with a ovie to workers ight affect workers judgents. type of annotation bias, but instead of focusing on debiasing, we exploited the bias to develop an active learning algorith aiing to iprove a certain classifier perforance. We defer the detailed discussion of the related work to Section 7. There are several research challenges in solving this proble. First, how do we odel workers behavior when they ake judgents in batches? Second, how do we leverage the odel to debias the crowdsourced annotation of data batches? We ake the following contributions in answering these questions: 1. Proposing an interpretable worker annotation odel on sall batches of data. We propose a novel worker odel for binary annotation behavior with data ites presented as batches. The odel incorporates independent judgents and batch judgents based on ranking. Different fro the factor graph odel in our previous work [36, we focus on obtaining the true labels of data ites instead of iproving classifier perforance. 2. Debiasing annotation data obtained as batches. Based on our proposed worker odel, we provide an algorith to debias the inferred labels when they are collected fro data ites in sall batches. 3. Conducting experients on a real-world crowdsourcing platfor. We conduct experients on both synthetic and real-world crowdsourcing data sets to verify the effectiveness of our proposed odel and debiasing strategies. Experiental results show the effectiveness of our debiasing ethod over other baselines. The rest of this paper is organized as follows: Section 2 introduces the background of crowdsourcing, and foralizes the research proble; Section 3 proposes the worker odel for annotating sall batches of data; Section 4 presents a strategy to debias batch annotations; Section 5 describes experiental results; Section 6 discusses extensions of our proposed ethod; Section 7 presents related work and Section 8 concludes. 2. PRELIMINARIES In this section, we forally define the concepts and notations we use in this paper; we then give a foralize the proble of debiasing crowdsourced batches. 2.1 Basic Concepts and Terinology First we need to foralize several basic concepts in a crowdsourcing platfor. Suppose we are given a set of data ites X = {xi }, where i = 1,..., n. Each data ite is associated with a label yi Y, where Y = {yi }n i=1. In following discussion, we focus on a binary classification task, where Y = {0, 1}, but our fraework generalizes to ulti-class or rating cases sealessly. According to a standard foralization in learning theory for binary classification, we suppose each (xi, yi ) is generated fro a joint probability distribution PX Y. We define ηi to be the conditional probability P (yi = 1 xi ). In a job or task subitted to a crowdsourcing platfor, we can asseble several data ites into a batch. Each batch bj is represented by a set of indices of data ites in the batch, denoted as {bj1,..., bjk }, where k is the size of a batch. To be strict, data ites in the batch should be represented by xj = {xbj1,..., xbjk }. However, for siplicity, we denote data ites in the batch specified by bj as {xj1,..., xjk }. Siilarly, we define yj = {yj1,..., yjk } to be true labels associated with data ites in xj, where yjl is the true label of xjl according to Y for 1 l k. In CrowdFlower language, a batch corresponds to a single unit, where a worker has to judge the entire unit at the sae tie; in Mechanical Turk language, a batch corresponds to a single HIT (short for Huan Intelligence Task). Usually, data ites in the sae batch ight share the sae context, background, or the sae instruction, in order to reduce the overhead. For exaple, if one is asked to judge whether a review about a restaurant is positive or negative, it ight save tie for workers if reviews of the sae restaurant are grouped into the sae batch, as they only need to read the description of the restaurant once before they can ake ultiple judgents on different reviews. As we asseble data ites into batches, each worker has to judge the entire batch as a single judgent. Given a batch bj, the judgent provided by a worker can be represented as yj0 =

3 (y j1,..., y jk), where y jl Y is the annotation of data ite corresponding to x jl, provided by the worker. Noting that the worker annotation y j can be different fro the true label y j. We refer to worker annotation as annotation, while the ground-truth label is referred to as siply the label. In CrowdFlower, as a judgent can only be ade based on a unit, workers are not allowed to subit partial results on a batch (as with Mechanical Turk). However, one can always add an unknown option for every data ite, so that the workers can provide partial results on a batch. For siplicity, we consider no partial judgents in the rest of the paper. Now, we are in a position to give a foral definition for a batch of data ites: DEFINITION 1 (BATCH). Given a data set (X, Y ), a batch of data ites with size k extracted fro the given data set can be represented as (b j, x j, y j, y j), where b j = (b j1,..., b jk ) is a set of indices for X and Y ; x j = {x j1,..., x jk } is a set of all the data ites, indexed by b j (i.e. x jl = x bjl ); y j = {y j1,..., y jk } consists of the corresponding true labels of data ites in x j, also indexed by b j; y j = {y j1,..., y jk} is the worker annotation on the set of the batch. Additionally, a set of batches can be defined as: DEFINITION 2. Given a data set (X, Y ), a set of batches extracted fro the given data set is denoted as A = (B, X B, Y B, Y B), where B = {b j} consists of the indices of each batch; X B = {x j} is the set of data ite batches, with their corresponding true labels Y B = {y j} and worker annotations Y B = {y j}. Rearks. 1) Notice that a data ite x i X ay certainly appear in ultiple batches in A. If data ites in two different batches share the sae index as indicated by corresponding ite in b j, they refer to the sae data ite in X; 2) For the sake of fully utilizing the workforce of crowds, without loss of generality, we focus on the scenario when all batches have the identical size k. However, our odel generalizes to the case when batches have different sizes; 3) In soe real world crowdsourcing platfors, a batch can actually be judged by ultiple workers, which eans there could be ultiple y j s associated to a single (b j, x j, y j) for instance, this is referred to as ultiple assignents on Mechanical Turk. However, for the purposes of debiasing, it is equivalent to regard a single batch as ultiple identical batches, and associate each batch with a unique judgent ade by different workers. 2.2 Proble Definition Based on the concepts described thus far, we can foralize the proble of debiasing crowdsourced batches as the following: PROBLEM 1 (DEBIASING CROWDSOURCED BATCHES). Suppose we have a labeled data set (X L, Y L) with Y L known, as well as its extracted batches and their crowdsourced annotation (B L, X BL, Y BL, Y B L ). If we are then given another unlabeled data set X U, as well as its extracted batches and crowdsourced annotation (B U, X BU, Y B U ), the objective is to infer the true labels Y U associated with X U fro the crowdsourced annotation. Notice that our proble forulation as described above requires as input labeled and annotated data ites for training purposes. In practice, the labeled data for training can be collected fro the test questions with ground-truth labels, inserted by the crowdsourcing platfor for the purpose of quality control and onitoring of workers. The usage of test questions is standard practice: As an exaple, in CrowdFlower, all workers have to attept a certain nuber Notation Table 1: Notation description. Description X Set of all the data ites {x i} n i=1 Y Set of all true labels associated with data ites in X b j A set of data ite indices {b jl } k l=1 x j A data ite batch {x jl } k l=1 where x jl is extracted fro the data ite in X with index specified by b jl y j A label batch consists of true labels {y jl } k l=1 associated with data ites in x j y j Worker annotation collected fro a crowdsourcing platfor for data ites in x j B Set of all the batches {b j} i=1 X B Set of all the data ite batches Y B Set of all the true labels associated with data ite batches in X B Y B Set of all the worker annotation fro crowds on data ite batches in X B of test questions with correct labels and need to achieve an accuracy over a certain threshold (e.g. 70%) before they can proceed to work on the regular task(s). Also, additional hidden data ites with known labels can be inserted into the regular tasks to onitor the accuracy of workers. In our setting, worker behavior on these test questions or labeled data can additionally be used for training purposes. Also notice that in this version of our proble forulation, we assue identical worker behavior. In practice, it is ore often to assue identical worker behavior, as there are usually not enough work done by each worker to ascertain individual behavior. Also, it is straightforward to extend our odel when different workers have different behavior when working on tasks. 3. CROWDSOURCING WORKER ANNO- TATION MODEL ON BATCHES In this section, we first describe our odel for workers annotation behavior on a sall batch of data ites; then we introduce how to train the odel based on a training data set. Our key intuition is the follows: when a worker judges a batch of data ites, she can either: 1) choose to judge data ites independently as if they are presented alone; or 2) to rank all the data ites according to their relative inherent values and annotate the top several ites as positive, leaving the rest in the batch as negative. Plackett-Luce odel. Before we delve into our odel, we first recap a probability odel for generating rankings based on scores associated with ites, naely the classical Plackett-Luce odel [15, 21 introduced in the 70s. Without loss of generality, suppose we are given a set of ites x 1,..., x k. Each ite x i is associated with a certain score s(x i) > 0. Here the score s(x i) odels the tendency of ranking x i higher in a randoly generated ranking and can be viewed as a easure of the inherent goodness of the ite. A ranking of these ites can be represented as a bijection π : {i} k i=1 {x i} k i=1, that aps the i to the data ite at the i-th position in the ranking. The corresponding ranking list can be represented as π(1) π(k). In Plackett-Luce odel, the probability of generating a ranking π is: P (π) = k i=1 s(x i) k r=i s(xr) (1) The equation above can be interpreted as the following process:

4 Initially, we have a pool A of all the data ites. Each tie one picks an ite x i fro a pool A of data ites with a probability proportional to its score, naely: s(x i) P (picking x i fro A) = x s(xr) r A This ite is then reoved fro the pool A and placed at the next position in the ranking. Repeat this operation until A becoes epty. The probability of generating a ranking list according to this process is equivalent to the probability described in the Plackett- Luce odel. Worker odel. We now introduce our worker odel for annotating sall batches of data ites. Again, without loss of generality, suppose we are given a batch x j where x jl = x l, naely the given data ite batch can be denoted as x j = {x 1,..., x k }. Also, recall that for each data ite x i, we denote P (y i = 1 x i) as its inherent score η i, which is not explicitly known. When a worker starts to work on a certain batch of data ites, they ay choose to use one of two strategies: Independent judging. If the worker is aking judgents based on the absolute value of η i for each data ite, we suppose the worker judges each data ite x i x j independently by drawing the annotation y i = 1 with probability η i and y i = 0 with probability (1 η i). Relative judging. If the worker is aking judgents based on coparing data ites within the sae batch without knowing absolute value of η i, we suppose the worker chooses to first rank all the data ites based on their probability of being positive, then annotates the top-τ ites in the ranking as positive, leaving the other ites annotated as negative. To be precise, the worker generates a ranking π for k ites in the batch according to the Plackett-Luce odel, with the scoring function defined as s(x i) = η i. Then the worker draws an integer 0 τ k fro a certain distribution, where p τ denotes the probability of drawing the integer τ. For data ites ranked as top-τ in the ranking, denoted as x i {π 1 (1),..., π 1 (p)} (could be epty if p = 0), the worker annotates the as y i = 1, while other data ites not within the top-τ of the ranking π are annotated as y i = 0. To cobine these two different scenarios, we suppose the worker is able to judge independently with a certain probability 0 < λ < 1, while with probability (1 λ) the worker can only ake relative judgents. The intuition of this odel is to capture two behavior pattern of workers. In the independent judging scenario, workers can reain independent in judging different data ites in the sae batch, with each data ite being judged based on its inherent score η i. Nevertheless, soeties workers ight judge data ites within a batch by coparison. In the relative judging scenario, workers still have the ability to judge the relative relationships between data ites in the sae batch, which is captured by the Plackett-Luce odel for generating the ranking. In order to deterine the labels of data ites, they have an expectation of label distribution, which is reflected by the distribution of generating τ, as it characterizes the probability of having τ positives within k data ites. For instance, if workers expect there to be few positive ites, then the probability of τ being low is high, while if workers expect the batches to be balanced, then the probability of τ being close to k/2 is high coparing to other values of τ. However, this distribution does not necessarily reflect the correct label distribution. When they try to apply their expectation of the label distribution on the batch, bias ight occur. We suarize the process of generating annotation for a batch of data ites in our proposed odel as below: 1. Toss a coin Z Bernoulli(λ). If Z = 1, go to Step 2; otherwise go to Step For each x i, generate y i Bernoulli(η i). Output the results and exit. 3. Generate a ranking π based on Plackett-Luce odel for data ites x i in the batch. 4. Draw τ Mult(p τ ). 5. For the top-τ ites in ranking π, generate y i = 1; otherwise generate y i = 0. Output the results and exit. Model learning. The paraeters that need to be deterined in this worker odel include: the probability of aking independent judgents λ, and the distribution of the nuber of positive annotation when aking relative judgents, represented by p 0,..., p k, where 0 p τ 1 and p τ = 1. We assue these two paraeters are fixed for each new application of our techniques. However, for different applications, these paraeters ight be different for instance, these paraeters for content oderation ay be different fro the sae paraeters for spa identification or sentient analysis. Suppose we are given a set of n L ites X L with their true labels Y L, or ore ideally, their inherent scores {η i} xi X L. If the inherent score of a data ite η i is not given, but only y i {0, 1} is known, as a heuristic, we can estiate η i by η i = (y i +ɛ)/(1+2ɛ) where ɛ is a sall constant, which is set to 10 3 in our experients. Then, we for the into L batches B L, send the to the crowds, and obtain their annotation fro workers, denoted as Y B L. For siplicity, we write x bjt as x jt, and siilar to y jt and η jt. For each batch b j B L, we denote the set of ites annotated by workers as positive as Xj 1 = {x jt y jt = 1}, and the set of ites annotated as negative as Xj 0 = {x jt y jt = 0}. We train the odel by axiu likelihood estiation. The likelihood of the obtained annotation can be written as: L [ k L = λ η y jt jt (1 ηjt)(1 y jt) +(1 λ) p τj P (Xj 1 Xj 0 ) }{{} t=1 }{{} relative judging independent judging (2) where τ j = Xj 1 is the nuber of positive annotation in batch b j; P (Xj 1 Xj 0 ) denotes the probability of generating any rankings π that rank ites in Xj 1 higher than any ites in Xj 0, naely: P (Xj 1 Xj 0 ) = P (π) π R(X j 1,X0 j ) where R(X 1, X 0) = {π π 1 (x 0) > π 1 (x 1), x 1 X 1, x 0 X 0}; and P (π) is defined by the Plackett-Luce odel, as presented in (1). Applying an EM-algorith, where at E-step, we can have ˆλ j = ˆλ k t=1 ηy jt jt (1 ηjt)(1 y jt) ˆλ k t=1 ηy jt jt (1 ηjt)(1 y jt) + (1 ˆλ)ˆp τj P (X 1 j X0 j ) (3)

5 And at M-step, we update the paraeters ˆλ and ˆp τ by ˆλ = 1 L ˆλ j, ˆp τ = 1 L (1 ˆλ j)1 { X 1 L Ẑ j =τ} (4) where Ẑ = L (1 ˆλ j). 4. DEBIASING ANNOTATION In this section, we introduce our ethod that debiases annotations collected for sall batches of data given the trained worker odel. More precisely, given a set of n U unlabeled data ites X U, assebled into U batches represented by B U, as well as their annotations obtained fro the crowds Y U, how do we infer their true labels Y U? The basic idea is, based on the given worker odel, we infer η i for each x i X U. Then, we siply apply the Bayes classifier to deterine the inferred label, which yields ŷ i = 1 if η i > 0.5, or ŷ i = 0 if η i 0.5. We again adopt a axiu likelihood estiation techique. The log-likelihood of the obtained annotation is: U [ log L(η) = log ˆλ x jt1 X 1 j η jt1 x jt0 X 0 j +(1 ˆλ)ˆp τj P (Xj 1 Xj 0 ) (1 η jt0) Notice that ˆλ and ˆp τj are paraeters learned fro Section 3, and P (Xj 1 Xj 0 ) is also a function of η i s. Siilar to the previous section, we apply an EM-algorith here by first calculating ˆλ j for each batch at the E-step according to (3) but replacing λ and p τ by the value we learned during the training step. Then we have: U [ log L(η) ˆλ j x jt1 X 1 j log η jt1 + x jt0 X 0 j log(1 η jt0) (5) U + (1 ˆλ [ j) log ˆp τj + log P (Xj 1 Xj 0 ) (6) where the second ter includes log P (Xj 1 Xj 0 ), which is hard to optiize. We apply the idea of the EM-algorith again here. We use notation R j to represent R(Xj 1, Xj 0 ). For each π R j, we can calculate its conditional probability given Xj 1 Xj 0, denoted as ˆq π by: ˆq π = P (π Xj 1 Xj 0 P (π; ˆη) ; ˆη) = P (π; ˆη) which is the E-step. According to Jensen s inequality we have: log P (X 1 X 0) = log P (π) (7) ˆq π log P (π) (8) where the last inequality yields the objective function we want to optiize. The correctness of EM-algorith guarantees the convergence of optiizing this function. Furtherore, according to the inorization-axiization (MM) algorith used in [11, we obtain the lower bound for log P (π), which is defined by the Plackett-Luce odel, by: k 1 [ k log P (π) = log η π 1 (t) log t=1 s=t η π 1 (s) k 1 [ k s=t log η π 1 (t) η π 1 (s) k s=t ˆη π 1 (s) t=1 where ˆη i is the estiated paraeter of last iteration. By cobining (6), (8) and (9), we obtain the objective function to optiize as: U [ Q(η) = ˆλ j log η jt1 + log(1 η jt0) x jt1 X 1 j U + (1 ˆλ j) ˆq π x jt0 X 0 j (9) k 1 [ k s=t log η π 1 (t) η π 1 (s) k s=t ˆη π 1 (s) t=1 (10) Notice that Q(η) is actually a lower-bound of the original loglikelihood function. Moreover, for two EM-step and one MM-step we apply in deriving the Q-function, it is proven that by iproving Q(η) fro this iteration Q(ˆη), the iproveent of the loglikelihood is no less than the iproveent we achieve on the Q- function. Therefore optiizing Q(η) can also optiize the loglikelihood. Take the derivative, we obtain Q(η) η i = j M 1 (i) 1 η i U (1 ˆλ j) j M 0 (i) ˆq π ˆλ j 1 η i [ X 1 j t=1 1 {π 1 (i) t} k s=t ˆη π 1 (s) (11) where M 1(i) and M 0(i) are defined as M y(i) = {j : x i X y j } for y {0, 1}. The updating rule can be obtained by solving Q(η)/ η i = 0, naely where ˆη i = T1 + T2 + T3 (T 1 + T 2 + T 3) 2 4T 1T 3 2T 3 (12) T 1 = M 1(i), T 2 = U T 3 = (1 ˆλ j) ˆq π j M 0 (i) [ X 1 j t=1 ˆλ j 1 {π 1 (i) t} k s=t ˆη π 1 (s) By iteratively updating the scores to optiize the likelihood of the annotation on test data, we can obtain the inferred η i of each ite. Based on this, we can deterine the inferred binary label for each data ite by assigning y i = 1 if ˆη i > 0.5, or y i = 0 otherwise. Notice that we do not further tune the threshold in this step, as the scores we learned here are expected to be a reasonable estiate of the true η i s. Therefore, if the inherent scores are known, learning theory guarantees us that by using Bayes classifier (naely to take 0.5 as threshold) is supposed to achieve the best expected perforance in ters of square loss. 5. EXPERIMENTAL RESULTS In this section, we conduct experients on a synthetic data set and a real data set to verify the effectiveness of our proposed worker odel and debiasing technique.

6 Input: Data batches B U = {b j }, crowdsourced annotation Y U = {y j }; training data batches B L, crowdsourced annotation Y L, true labels Y L. Output: Inferred labels Y U. // Training work odel; 1 ˆλ j 0.5; Initialize ˆp τ by rando values; 2 repeat 3 Update ˆλ j bfor 1 j L by (3); 4 Update ˆλ and ˆp τ by (4) 5 until L converged; // Calculate debiased labels; 6 repeat 7 Update η i for 1 i n U by (12); 8 until Q(η) converged; 9 y i 1 {ηi >0.5} for 1 i n U ; 10 Output Y U. Algorith 1: Debiasing crowdsourced annotation on batches of data ites. 5.1 Experiental Data Sets We first introduce the data sets we used in this experients. A suary of the data sets we use in our experients is provided in Table 2. Synthetic data set. We construct synthetic data sets following the worker annotation odel we propose in Section 3. Suppose we have n ites in X, we first generate their inherent scores η i for each x i X fro a Beta distribution Beta(α, β), then generate the true labels Y by drawing y i fro a Bernoulli distribution paraeterized by η i for each i. In our synthetic data set, we set α = 2 and β = 4 to siulate the case when negative data ites overwhel positive data ites. Then, we generate batches of size k by sapling without replaceent for each batch. Notice that by the phrase without replaceent we ean there are no identical data ites within the sae batch, while the sae ite can still appear in ultiple batches as we do replace the ites back into the pool after a batch is generated. Thereby we obtain the set of batches B. For each b j in B, we generate the workers annotation y j fro our proposed batch annotation odel. The probability of aking independent judgents λ is set as 0.5. The distribution of deterining nuber of positive annotations p τ is also assigned to be: p τ (τ + 1) ρ where ρ is positive constant, set as 2 in our experients. Coents data set. We utilize a real world crowdsourcing data set for annotating coents, which is used in [36. The original crowdsourcing task was to identify inappropriate coents on LinkedIn posts published by copanies or LinkedIn influencers. Inappropriate coents are defined as coents containing prootional, profane, blatant soliciting, rando greeting coents, as well as coents with only web links and contact inforation. In order to collect annotation of coents, for each post, k coents are sapled and sent to CrowdFlower as a batch (unit). Workers are also provided with a codebook (i.e., a sequence of instructions) explaining how to annotate the data ites. Each coent is regarded as a data ite and can be annotated as positive (inappropriate coent) or negative (acceptable coent). Each batch is annotated by 5 or ore workers. In order to provide test questions and track the perforance of each worker, soe of the batches are annotated by 9 trained LinkedIn eployees (experts) with the sae codebook and inter- Table 2: Data set statistics. Data set k n L L n U U Synthetic 5 1,000 10,000 5,000 50,000 Coents , ,267 face as used for crowd workers. The average Cohen s kappa for all expert pairs is For this experients, we only adopt the batches with all of their data ites annotated by both crowds and experts as we can use the experts annotation as ground truth (aggregated by ajority voting). Out of these batches, the 1,099 batches that are annotated before a worker actually starts on the job are utilized as training data set B L. while the other 5,267 batches are utilized as the test data set B U to infer the 651 data ites the 5,267 batches covered. 5.2 Experiental Setup Methods evaluated. We copare the perforance of our proposed ethod with several baselines: Majority Voting (MV). For each data ite in the test data set, siply deterine its inferred label by the way it is annotated by the ajority of workers. This aggregation strategy is often used in practice [28. Majority Voting with Tuned Threshold (). Instead of siply applying ajority voting, we calculate the ratio of positive annotation on each ite as a score, and tune the threshold for deterining the binary inferred label. Based on a given training set of annotation and true labels, we find the threshold yielding the best F 1-score on training data set, and apply the sae threshold on the test data set. Plackett-Luce Model (PL). A strategy is to fit the Plackett- Luce odel on the test data by inferring the scores s(x i) associated with each data ites. We apply a Bayesian regularization on the inferred scores to confine it to be 0 s(x i) 1. We then infer a positive label to each data ite with an inferred score s(x i) > 0.5 and a negative label otherwise. * Batch Annotation Model (). The debiasing strategy proposed in Section 3 and 4. Evaluation ethodology. For baselines without training, we directly apply the on the test data set; for our proposed ethod as well as, we first train the worker odel on the training data set, then apply the debiasing strategy based on the trained worker odel on the test data set. We copare the inferred labels to the ground-truth and evaluate the perforance in ters of accuracy, precision, recall and F 1-score. Trials and setup. For our proposed odel, in training phase, we initialize λ to be 0.5 and p τ to be a rando value between 0 and 1; in debiasing phase, we initialize the all the inferred scores as 0.1. For training the worker odel, we set a fixed nuber of iteration as 100. Our experiental results presented later show the odel converges within a nuber of iterations uch fewer than 100. For debiasing, we calculate the log-likelihood of the odel and stop when the relative change of log-likelihood is within Experiental Results Now we present the experiental results. We first verify the learning algorith of our odel on the synthetic data set, then present the learned odel paraeters on a real data set; we also

7 p τ Generative paraeters Learned paraeters τ (a) Paraeter coparison of p τ s Log-likelihood (10^3) Iteration nuber (b) Convergence analysis p τ Learned paraeters τ (a) Paraeter analysis of p τ s Log-likelihood Iteration nuber (b) Convergence analysis Figure 2: Learning worker odel fro the synthetic training data set. Figure 4: Learning worker odel fro the coents training data set. ˆλ λ λ (a) Estiation error of λ ˆp τ p τ ρ (b) Estiation error of p τ Figure 3: Analysis of estiation error of paraeters in the worker odel under different configurations. evaluate the effectiveness of our debiasing strategy on both synthetic data set and real data set, which deonstrates an iproveent in ters of F 1-score; finally we conduct a study on different configurations of experients as a guideline for setting up a batched crowdsourcing task. Worker odel learning. We first verify the effectiveness of learning our proposed worker odel. On our synthetic data set, the true value of probability of aking independent judgents λ is set to 0.5. We learn the odel fro the synthetic training data and obtain the inferred ˆλ as , which reasonably recovers the original value. We also copare the original odel paraeters p τ s to the inferred paraeters in Figure 2(a). The black dashed line represents the original paraeters used for generating synthetic annotation data, while the red solid line shows the inferred paraeters of worker odel, which sees as a precise fit of the original paraeter. We also show the curve of log-likelihood of the training data set, which sees to converge within 20 iterations. To further confir the robustness of our learning ethod, we odify the configuration of synthetic data generation, and train the worker odel on different data sets to check if they can recover the original paraeters. We still take the sae configuration of n L = 1, 000 and L = 10, 000. The estiation error analysis is shown in Figure 3. Figure 3(a) shows the difference between the inferred paraeter ˆλ and the true paraeter λ, given the annotation data generated by λ varying fro 0.1 to 0.9. It can be observed that the error is reasonable sall, basically within 0.1. Figure 3(b) shows the l 2 nor of the difference between the estiated distribution ˆp τ and the true distribution p τ, when p τ is generated with respect to different ρ varying fro 1 to 3. In ost of the settings, the error is below , which is fairly low. Although we only instantiate p τ s using a power-law distribution, as the learning ethod does not confine the learned distribution to be paraetric, it can be directly applied to any other type of distributions. Learned odel on real data set. Given the effectiveness of our learning ethod verified, we apply the worker odel trying to fit the data set of annotating inappropriate coents. The learned probability of a worker aking independent judgents ˆλ is The learned distribution for deterining the nuber of positive annotations in a batch is presented in Figure 4(a). It shows that a worker tends to annotate the entire batch as negative (i.e. acceptable coent) with a probability over 0.6, while picking only 1 of the as positive (i.e. inappropriate coent) also occurs with a relative high probability around The workers see to be reluctant to annotate ore than 1 coents in a size-5 batch. This is coherent with ost people s intuition that inappropriate coents are rare coparing to the entire set of coents. The convergence analysis is shown in Figure 4(b). The odel converges within 50 iterations. Perforance coparison. We proceed to evaluate the perforance of different aggregation strategies on both data sets. The overall perforance results are shown in Table 3. In both data sets, our proposed debiasing strategy is a clear winner in ters of F 1- score, and also achieves the best accuracies. In synthetic data set, ajority voting, without tuning the threshold (default set to 0.5), fails to identify ost of the positive data ites, and therefore achieves an extreely low recall. Only after the threshold is tuned on a training data set can it achieve a reasonable F 1-score of 71%. PL-odel, in contrast, achieves a relatively low precision of 52%. Our proposed ethod is able to achieve the best overall perforance in ters of F 1-score and accuracy, and the precision and recall achieved by our ethod are also relatively balanced. Notice that we do not directly apply any threshold tuning for our ethod and siply takes the threshold as 0.5. In coents data set, the naïve ajority voting strategy again obtains a poor recall below 80%. After tuning the threshold, its recall rises to around 85%, but still lower than our proposed ethod. The scores learned by PL-odel yield a coparable recall to ajority voting with tuned threshold, but fail to achieve a high precision. Our proposed ethod achieves a coparable precision of 93% and a higher recall of 87%, and therefore beat all the other baselines in ters of F 1-score (90%). Batch nuber vs. ite nuber n. An interesting question to study is, for a certain nuber of ites, how any (rando) batches of data ites does one need to label to obtain an aggregated result accurate enough. We study this question by generating synthetic data sets with different settings of nuber of batches L and U while nuber of data ites n L and n U are fixed. In this experients, we set n L = 1, 000 and n U = 5, 000, and generate synthetic data sets with L/n L = U /n U = 2, 5, 10, and 20. We then apply all the strategies on these data sets. To iniize randoness, for each setting we repeat the data generation and application of debiasing strategies for 10 ties, then report the

8 Table 3: Perforance coparison of Majority Voting (MV), Plackett-Luce Model (PL) and Batch Annotation Model (). All results are shown as percents. Data set Method Acc. Prc. Rcl. F 1 MV Synthetic PL MV Coents PL Accuracy MV 0.70 PL /n (a) Accuracy with different /n ratio F 1 -score MV PL /n (b) F 1-score with different /n ratio Figure 5: Perforance of debiasing strategies on synthetic data sets generated by setting both L/n L and U /n U as 2, 5, 10, 20 respectively. average perforance. Results are shown in Figure 5. As we can observe, under all the different settings, the proposed ethod consistently outperfors other baselines, in ters of both accuracy and F 1-score. Majority voting with tuned threshold () is able to achieve coparable results to our proposed ethod when /n are large enough (e.g. /n = 20). However, when /n is relatively sall, our proposed ethod can achieve uch better results than ost of other baselines. When /n = 2, it achieves an accuracy approxiately 9% higher than, and an F 1-score around 5% ore than. An exception is the naïve ajority voting strategy that achieves the best accuracy when /n = 2. This is due to the skewed distribution of data labels, and by siply labeling all the data ites as negative can get an accuracy of approxiately 80%. In coparison, the F 1-score of MV is only around 30%. Another observation that we can ake about Figure 5(b) is that the perforance of ajority voting drops as /n increases. This result indicates when workers are biased and no debiasing techniques are applied, increasing the quantity of labels collected does not help. Size of training data set. As our ethod requires a sall set of training data, there ight be soe concerns about how large a training data set is sufficient. We test the perforance of two ethods that rely on training data sets and our proposed ethod on synthetic data sets and the coents data set. For the synthetic data set, we keep the size of test data set as n U = 5, 000 and U = 50, 000, and vary the size of training data set by setting n L as 10, 20, 50, 100, 200, 500, and 1, 000, while setting L as 10n L. For each configuration, we generate synthetic data sets 10 ties and utilize the average perforance on these 10 data sets to evaluate the debiasing perforance. For the coents data set, we randoly saple L batches fro the training data set, where Accuracy n L (a) Accuracy with different sizes of training data set F 1 -score n L (b) F 1-score with different sizes of training data set Figure 6: Perforance of debiasing strategies on synthetic data sets generated by different size of training data set n L ( L = 10n L), while the size of testing data set reains n U = 5, 000 and U = 50, 000. Accuracy L (a) Accuracy with different sizes of training data set F 1 -score L (b) F 1-score with different sizes of training data set Figure 7: Perforance of debiasing strategies on coents data set where the training data set is randoly sapled fro the original training set with different size of L, while the testing data set reains the sae. L is set to 100, 200,..., Again, for each configuration of training data size, we repeat the rando sapling for 10 ties and report the average perforance. The results of synthetic data set are shown in Figure 6. As observed, when training data set is extreely sall (e.g. n L = 10), the perforance of drops in ters of both accuracy and F 1- score (73% and 56% respectively). As the size of training data set increases, the perforance of becoes coparable to our proposed ethod. However, the perforance of our proposed ethod is surprisingly stable, even when there are only 10 ites and 100 batches as training data, which is as 1/500 large as the data set used for testing. The results iply our proposed ethod can obtain very high perforance with a sall cost of labeling ground-truth data for collecting training data. The results for the coents data set are shown in Figure 7. Again, when training data size is extreely sall (e.g. L = 100), the perforance of drops substantially (89% in accuracy and 75% in F 1-score), while its perforance gets ore and ore coparable to our ethod as the training data size increases. In contrast, our proposed ethod aintains a fairly stable perforance for different sizes of the training data set. This verifies again the ability of our proposed ethod to yield high-quality results with a sufficiently sall training data set. 6. EXTENSIONS In this section, we discuss two straightforward extensions of our proposed worker odel and debiasing strategies, with respect to two useful applications other than binary classification: ulti-class classification, and rating estiation. Rating estiation. In rating estiation,each data ite x i is no

9 longer associated with a discrete label fro a finite set of labels, but instead, a real value y i R. Suppose workers are asked to provide ratings on each data ite in batches, and our goal is to identify the true (unknown) rating of each data ite. A naïve way to aggregate their ratings on the sae ite is to take the average value of their ratings as the estiated rating. However, when ultiple data ites are grouped into batches, there can be bias siilar to the one introduced in this paper. Although we do not explicitly foralize our proble for a rating task, with soe straightforward odifications, our techniques can still be applied. Without loss of generality, we can assue 0 < y i <. If the actual rating can be negative, we can always apply a certain sigoid function to noralize the scores to be positive values. For independent judging, we can design a wellregularized distribution with expectation of y i for a worker to draw a rating, e.g. Gaussian distribution centered at y i. For relative judging, we can still assue the worker to generate a ranking fro Plackett-Luce odel with paraeters y i s, and introduce distributions for generating rating of data ites at different positions in the ranking, which are to be learned fro the training data. For exaple, workers ay tend to generate a rating fro Gaussian distribution centered at µ 1 = 5.0 for a top-ranked data ite π(1), but generate a rating fro another Gaussian distribution centered at µ 5 = 1.0 for a data ite ranked as the fifth π(5). If the training data is sparse and the ratings can take on any real values, it is probably preferable to eploy soe paraetric distributions. Once the design of odel is accoplished, it is straightforward to apply the sae technique described in this paper to derive the debiasing strategy by axiizing the likelihood of observed annotations to estiate the underlying ratings for unrated data ites. Multi-class classification. In a ulti-class classification proble, the label set Y ay contain ore than 2 possible labels. Workers are usually requested to assign data ites with different labels. This is a natural extension fro binary classification proble. If the labels in Y have an order, for exaple, judging whether a review is very helpful, helpful or not helpful, the proble reduces to a rating estiation proble, where the possible value of rating are discrete values. We can siply apply the extended strategy described above. If the labels in Y do not have an order, the proble can be reduces to several binary classification probles, which is straightforward to apply our strategy for debiasing workers annotations. 7. RELATED WORK In this section, we first introduce existing studies on annotation bias of crowds, when data ites are presented either independently, or in a sequence or batches; we then introduce rank aggregation techniques as well as their application on crowdsourced ranking or rating. Annotation bias in independent judgents. A nuber of studies have been conducted on verifying and quantifying annotation bias of crowd workers. Snow et al. [29 explore the perforance of annotations by non-expert workers for several NLP tasks. Deeester et al. [8 discuss the disagreeent between different users on assessent of web search results. There are also extensive studies on odeling worker behaviors. Raykar et al. [23, 24, 25 study how to learn a odel with noisy labeling. Specifically, they eploy a logistic regression classifier, and insert hidden variables indicating whether a worker tells the truth. Karger et al. [12 propose an iterative algorith to infer workers reliability and aggregating their answers. Whitehill et al. [35 odel the annotator ability, data ite difficulty, and infer the true label fro the crowds in a unified odel. Most of these work also proposes various generative odel to capture worker behavior. However, they assue judgents on different data ites are independent, which is not necessarily true when data ites are grouped into batches. Venanzi et al. [33 propose a counity-based label aggregation odel to identify different types of workers, and correct their labels correspondingly. Das et al. [7 address the interactions of opinions between people connected by networks. They focus on another aspect of dependencies, which is the dependencies between workers, while in our studies, we are ore concerned about dependencies between data ites and their judgents. Annotation bias in sequential and batch judgents. A few researchers also notice the correlation between judgents on different data ites, but their work are ainly developed in the setting when data ites are reviewed in a sequence. Scholer et al. [26, 27 study the annotation disagreeents in a relevance assessent data set. They discover correlations between annotations of siilar data ites. They also explore threshold priing in annotation, where the annotators tend to ake siilar judgents or apply siilar standard on consecutive data ites they review. However, their work focuses on the scenario when data ites are organized in a long sequence. It confines the dependencies to exist only between consecutive data ites. Also, they focus ore on qualitative conclusions, without a quantitative odel to characterize and easure the discovered factors. Carterette et al. [4 provide several assessor odels for the TREC data set. Mozer et al. [18 study the siilar relativity of judgents phenoenon on sequential tasks instead of sall batches. Again, their focus is ore on data ites presented as a long sequence, while we focus ore on data ites presented in batches siultaneously. Our recent work [36 also considers a siilar setting when data ites are organized in sall batches; we verify the existence of annotation bias caused by batching data ites. Our focus in that paper was to design an active learning algorith to sartly asseble batches, aiing to iprove the perforance of the classifier trained on this annotation batches. Our focus was not on iproving the quality of labels collected, and we still used ajority voting to obtain labels for data ites. In this paper, we focus on debiasing the obtained labels directly, the higher label quality can benefit tasks including training and/or evaluating classifiers, with a broader range of applications. Crowdsourced ranking and rating. In our odel, we eploy the Plackett-Luce odel to capture worker behavior, and aggregate worker annotations on batches as rankings in order to infer true labels. There is a related thread of work on rank aggregation; however, to the best of our knowledge, we are the first to odel crowds annotating behavior on batches by ranking, and propose a debiasing strategy. Studies on aggregating ultiple rankings into a consistent ranking can be dated back to the seinal work of Arrow [2. Negahban et al. [19 study how to aggregate pairwise coparisons into a ranking by utilizing the Bradley-Terry odel [3, which is a siplified version of Plackett-Luce odel utilized in this paper. Hunter et al. [11 propose the inorization-axiization (MM) algorith to infer Plackett-Luce odel fro ultiple partial orderings. Soufiani et al. [30 generalize Negahban et al. s work and proposed a class of generalized ethod-of-oents (GMM) algorith to infer paraeters of Plackett-Luce odel fro ultiple orderings, and copare the perforance against MM-algorith. They then further extend their algorith to be applied to a ore general class of ranking odels called rando utility odels (RUMs) [31. In ad-

Leveraging Relevance Cues for Improved Spoken Document Retrieval

Leveraging Relevance Cues for Improved Spoken Document Retrieval Leveraging Relevance Cues for Iproved Spoken Docuent Retrieval Pei-Ning Chen 1, Kuan-Yu Chen 2 and Berlin Chen 1 National Taiwan Noral University, Taiwan 1 Institute of Inforation Science, Acadeia Sinica,

More information

Identifying Converging Pairs of Nodes on a Budget

Identifying Converging Pairs of Nodes on a Budget Identifying Converging Pairs of Nodes on a Budget Konstantina Lazaridou Departent of Inforatics Aristotle University, Thessaloniki, Greece konlaznik@csd.auth.gr Evaggelia Pitoura Coputer Science and Engineering

More information

Geo-activity Recommendations by using Improved Feature Combination

Geo-activity Recommendations by using Improved Feature Combination Geo-activity Recoendations by using Iproved Feature Cobination Masoud Sattari Middle East Technical University Ankara, Turkey e76326@ceng.etu.edu.tr Murat Manguoglu Middle East Technical University Ankara,

More information

Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches

Efficient Estimation of Inclusion Coefficient using HyperLogLog Sketches Efficient Estiation of Inclusion Coefficient using HyperLogLog Sketches Azade Nazi, Bolin Ding, Vivek Narasayya, Surajit Chaudhuri Microsoft Research {aznazi, bolind, viveknar, surajitc}@icrosoft.co ABSTRACT

More information

The optimization design of microphone array layout for wideband noise sources

The optimization design of microphone array layout for wideband noise sources PROCEEDINGS of the 22 nd International Congress on Acoustics Acoustic Array Systes: Paper ICA2016-903 The optiization design of icrophone array layout for wideband noise sources Pengxiao Teng (a), Jun

More information

Computer Aided Drafting, Design and Manufacturing Volume 26, Number 2, June 2016, Page 13

Computer Aided Drafting, Design and Manufacturing Volume 26, Number 2, June 2016, Page 13 Coputer Aided Drafting, Design and Manufacturing Volue 26, uber 2, June 2016, Page 13 CADDM 3D reconstruction of coplex curved objects fro line drawings Sun Yanling, Dong Lijun Institute of Mechanical

More information

Solving the Damage Localization Problem in Structural Health Monitoring Using Techniques in Pattern Classification

Solving the Damage Localization Problem in Structural Health Monitoring Using Techniques in Pattern Classification Solving the Daage Localization Proble in Structural Health Monitoring Using Techniques in Pattern Classification CS 9 Final Project Due Dec. 4, 007 Hae Young Noh, Allen Cheung, Daxia Ge Introduction Structural

More information

A Novel Fast Constructive Algorithm for Neural Classifier

A Novel Fast Constructive Algorithm for Neural Classifier A Novel Fast Constructive Algorith for Neural Classifier Xudong Jiang Centre for Signal Processing, School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore

More information

EE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011

EE 364B Convex Optimization An ADMM Solution to the Sparse Coding Problem. Sonia Bhaskar, Will Zou Final Project Spring 2011 EE 364B Convex Optiization An ADMM Solution to the Sparse Coding Proble Sonia Bhaskar, Will Zou Final Project Spring 20 I. INTRODUCTION For our project, we apply the ethod of the alternating direction

More information

Fair Energy-Efficient Sensing Task Allocation in Participatory Sensing with Smartphones

Fair Energy-Efficient Sensing Task Allocation in Participatory Sensing with Smartphones Advance Access publication on 24 February 2017 The British Coputer Society 2017. All rights reserved. For Perissions, please eail: journals.perissions@oup.co doi:10.1093/cojnl/bxx015 Fair Energy-Efficient

More information

TensorFlow and Keras-based Convolutional Neural Network in CAT Image Recognition Ang LI 1,*, Yi-xiang LI 2 and Xue-hui LI 3

TensorFlow and Keras-based Convolutional Neural Network in CAT Image Recognition Ang LI 1,*, Yi-xiang LI 2 and Xue-hui LI 3 2017 2nd International Conference on Coputational Modeling, Siulation and Applied Matheatics (CMSAM 2017) ISBN: 978-1-60595-499-8 TensorFlow and Keras-based Convolutional Neural Network in CAT Iage Recognition

More information

Clustering. Cluster Analysis of Microarray Data. Microarray Data for Clustering. Data for Clustering

Clustering. Cluster Analysis of Microarray Data. Microarray Data for Clustering. Data for Clustering Clustering Cluster Analysis of Microarray Data 4/3/009 Copyright 009 Dan Nettleton Group obects that are siilar to one another together in a cluster. Separate obects that are dissiilar fro each other into

More information

Utility-Based Resource Allocation for Mixed Traffic in Wireless Networks

Utility-Based Resource Allocation for Mixed Traffic in Wireless Networks IEEE IFOCO 2 International Workshop on Future edia etworks and IP-based TV Utility-Based Resource Allocation for ixed Traffic in Wireless etworks Li Chen, Bin Wang, Xiaohang Chen, Xin Zhang, and Dacheng

More information

Collection Selection Based on Historical Performance for Efficient Processing

Collection Selection Based on Historical Performance for Efficient Processing Collection Selection Based on Historical Perforance for Efficient Processing Christopher T. Fallen and Gregory B. Newby Arctic Region Supercoputing Center University of Alaska Fairbanks Fairbanks, Alaska

More information

Evaluation of a multi-frame blind deconvolution algorithm using Cramér-Rao bounds

Evaluation of a multi-frame blind deconvolution algorithm using Cramér-Rao bounds Evaluation of a ulti-frae blind deconvolution algorith using Craér-Rao bounds Charles C. Beckner, Jr. Air Force Research Laboratory, 3550 Aberdeen Ave SE, Kirtland AFB, New Mexico, USA 87117-5776 Charles

More information

New Measures for the Evaluation of Interactive Information Retrieval Systems: Normalized Task Completion Time and Normalized User Effectiveness

New Measures for the Evaluation of Interactive Information Retrieval Systems: Normalized Task Completion Time and Normalized User Effectiveness New Measures for the Evaluation of Interactive Inforation Retrieval Systes: Noralized Task Copletion Tie and Noralized User Effectiveness Jing Cheng Graduate School of Library and Inforation Science, University

More information

OPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS

OPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS Key words SOA, optial, coplex service, coposition, Quality of Service Piotr RYGIELSKI*, Paweł ŚWIĄTEK* OPTIMAL COMPLEX SERVICES COMPOSITION IN SOA SYSTEMS One of the ost iportant tasks in service oriented

More information

Energy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks

Energy-Efficient Disk Replacement and File Placement Techniques for Mobile Systems with Hard Disks Energy-Efficient Disk Replaceent and File Placeent Techniques for Mobile Systes with Hard Disks Young-Jin Ki School of Coputer Science & Engineering Seoul National University Seoul 151-742, KOREA youngjk@davinci.snu.ac.kr

More information

A Measurement-Based Model for Parallel Real-Time Tasks

A Measurement-Based Model for Parallel Real-Time Tasks A Measureent-Based Model for Parallel Real-Tie Tasks Kunal Agrawal 1 Washington University in St. Louis St. Louis, MO, USA kunal@wustl.edu https://orcid.org/0000-0001-5882-6647 Sanjoy Baruah 2 Washington

More information

Boosted Detection of Objects and Attributes

Boosted Detection of Objects and Attributes L M M Boosted Detection of Objects and Attributes Abstract We present a new fraework for detection of object and attributes in iages based on boosted cobination of priitive classifiers. The fraework directly

More information

IN many interactive multimedia streaming applications, such

IN many interactive multimedia streaming applications, such 840 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 5, MAY 06 A Geoetric Approach to Server Selection for Interactive Video Streaing Yaochen Hu, Student Meber, IEEE, DiNiu, Meber, IEEE, and Zongpeng Li, Senior

More information

Brian Noguchi CS 229 (Fall 05) Project Final Writeup A Hierarchical Application of ICA-based Feature Extraction to Image Classification Brian Noguchi

Brian Noguchi CS 229 (Fall 05) Project Final Writeup A Hierarchical Application of ICA-based Feature Extraction to Image Classification Brian Noguchi A Hierarchical Application of ICA-based Feature Etraction to Iage Classification Introduction Iage classification poses one of the greatest challenges in the achine vision and achine learning counities.

More information

NON-RIGID OBJECT TRACKING: A PREDICTIVE VECTORIAL MODEL APPROACH

NON-RIGID OBJECT TRACKING: A PREDICTIVE VECTORIAL MODEL APPROACH NON-RIGID OBJECT TRACKING: A PREDICTIVE VECTORIAL MODEL APPROACH V. Atienza; J.M. Valiente and G. Andreu Departaento de Ingeniería de Sisteas, Coputadores y Autoática Universidad Politécnica de Valencia.

More information

A Spectral Framework for Detecting Inconsistency across Multi-Source Object Relationships

A Spectral Framework for Detecting Inconsistency across Multi-Source Object Relationships A Spectral Fraework for Detecting Inconsistency across Multi-Source Object Relationships Jing Gao, Wei Fan, Deepak Turaga, Srinivasan Parthasarathy, and Jiawei Han University of Illinois at Urbana-Chapaign,

More information

1 Extended Boolean Model

1 Extended Boolean Model 1 EXTENDED BOOLEAN MODEL It has been well-known that the Boolean odel is too inflexible, requiring skilful use of Boolean operators to obtain good results. On the other hand, the vector space odel is flexible

More information

Oblivious Routing for Fat-Tree Based System Area Networks with Uncertain Traffic Demands

Oblivious Routing for Fat-Tree Based System Area Networks with Uncertain Traffic Demands Oblivious Routing for Fat-Tree Based Syste Area Networks with Uncertain Traffic Deands Xin Yuan Wickus Nienaber Zhenhai Duan Departent of Coputer Science Florida State University Tallahassee, FL 3306 {xyuan,nienaber,duan}@cs.fsu.edu

More information

Detection of Outliers and Reduction of their Undesirable Effects for Improving the Accuracy of K-means Clustering Algorithm

Detection of Outliers and Reduction of their Undesirable Effects for Improving the Accuracy of K-means Clustering Algorithm Detection of Outliers and Reduction of their Undesirable Effects for Iproving the Accuracy of K-eans Clustering Algorith Bahan Askari Departent of Coputer Science and Research Branch, Islaic Azad University,

More information

Investigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks

Investigation of The Time-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Investigation of The Tie-Offset-Based QoS Support with Optical Burst Switching in WDM Networks Pingyi Fan, Chongxi Feng,Yichao Wang, Ning Ge State Key Laboratory on Microwave and Digital Counications,

More information

THE rapid growth and continuous change of the real

THE rapid growth and continuous change of the real IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015 47 Designing High Perforance Web-Based Coputing Services to Proote Teleedicine Database Manageent Syste Isail Hababeh, Issa

More information

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System Designing High Perforance Web-Based Coputing Services to Proote Teleedicine Database Manageent Syste Isail Hababeh 1, Issa Khalil 2, and Abdallah Khreishah 3 1: Coputer Engineering & Inforation Technology,

More information

Collaborative Web Caching Based on Proxy Affinities

Collaborative Web Caching Based on Proxy Affinities Collaborative Web Caching Based on Proxy Affinities Jiong Yang T J Watson Research Center IBM jiyang@usibco Wei Wang T J Watson Research Center IBM ww1@usibco Richard Muntz Coputer Science Departent UCLA

More information

ELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH

ELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH ELEVATION SURFACE INTERPOLATION OF POINT DATA USING DIFFERENT TECHNIQUES A GIS APPROACH Kulapraote Prathuchai Geoinforatics Center, Asian Institute of Technology, 58 Moo9, Klong Luang, Pathuthani, Thailand.

More information

Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues

Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues Mapping Data in Peer-to-Peer Systes: Seantics and Algorithic Issues Anastasios Keentsietsidis Marcelo Arenas Renée J. Miller Departent of Coputer Science University of Toronto {tasos,arenas,iller}@cs.toronto.edu

More information

The Boundary Between Privacy and Utility in Data Publishing

The Boundary Between Privacy and Utility in Data Publishing The Boundary Between Privacy and Utility in Data Publishing Vibhor Rastogi Dan Suciu Sungho Hong ABSTRACT We consider the privacy proble in data publishing: given a database instance containing sensitive

More information

Homework 1. An Introduction to Neural Networks

Homework 1. An Introduction to Neural Networks Hoework An Introduction to Neural Networks -785: Introduction to Deep Learning Spring 09 OUT: January 4, 09 DUE: February 6, 09, :59 PM Start Here Collaboration policy: You are expected to coply with the

More information

Scheduling Parallel Task Graphs on (Almost) Homogeneous Multi-cluster Platforms

Scheduling Parallel Task Graphs on (Almost) Homogeneous Multi-cluster Platforms 1 Scheduling Parallel Task Graphs on (Alost) Hoogeneous Multi-cluster Platfors Pierre-François Dutot, Tchiou N Takpé, Frédéric Suter and Henri Casanova Abstract Applications structured as parallel task

More information

HKBU Institutional Repository

HKBU Institutional Repository Hong Kong Baptist University HKBU Institutional Repository HKBU Staff Publication 18 Towards why-not spatial keyword top-k ueries: A direction-aware approach Lei Chen Hong Kong Baptist University, psuanle@gail.co

More information

I ve Seen Enough : Incrementally Improving Visualizations to Support Rapid Decision Making

I ve Seen Enough : Incrementally Improving Visualizations to Support Rapid Decision Making I ve Seen Enough : Increentally Iproving Visualizations to Support Rapid Decision Making Sajjadur Rahan Marya Aliakbarpour 2 Ha Kyung Kong Eric Blais 3 Karrie Karahalios,4 Aditya Paraeswaran Ronitt Rubinfield

More information

A Novel 2D Texture Classifier For Gray Level Images

A Novel 2D Texture Classifier For Gray Level Images 2012, TextRoad Publication ISSN 2090-4304 Journal of Basic and Applied Scientific Research www.textroad.co A Novel 2D Texture Classifier For Gray Level Iages B.S. Mousavi 1 Young Researchers Club, Zahedan

More information

A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing

A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing A Trajectory Splitting Model for Efficient Spatio-Teporal Indexing Slobodan Rasetic Jörg Sander Jaes Elding Mario A. Nasciento Departent of Coputing Science University of Alberta Edonton, Alberta, Canada

More information

The Internal Conflict of a Belief Function

The Internal Conflict of a Belief Function The Internal Conflict of a Belief Function Johan Schubert Abstract In this paper we define and derive an internal conflict of a belief function We decopose the belief function in question into a set of

More information

News Events Clustering Method Based on Staging Incremental Single-Pass Technique

News Events Clustering Method Based on Staging Incremental Single-Pass Technique News Events Clustering Method Based on Staging Increental Single-Pass Technique LI Yongyi 1,a *, Gao Yin 2 1 School of Electronics and Inforation Engineering QinZhou University 535099 Guangxi, China 2

More information

Modeling Parallel Applications Performance on Heterogeneous Systems

Modeling Parallel Applications Performance on Heterogeneous Systems Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln

More information

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M. Kakade Microsoft Research and Wharton, U Penn skakade@icrosoft.co Varun Kanade SEAS, Harvard University

More information

A Periodic Dynamic Load Balancing Method

A Periodic Dynamic Load Balancing Method 2016 3 rd International Conference on Engineering Technology and Application (ICETA 2016) ISBN: 978-1-60595-383-0 A Periodic Dynaic Load Balancing Method Taotao Fan* State Key Laboratory of Inforation

More information

Different criteria of dynamic routing

Different criteria of dynamic routing Procedia Coputer Science Volue 66, 2015, Pages 166 173 YSC 2015. 4th International Young Scientists Conference on Coputational Science Different criteria of dynaic routing Kurochkin 1*, Grinberg 1 1 Kharkevich

More information

Theoretical Analysis of Local Search and Simple Evolutionary Algorithms for the Generalized Travelling Salesperson Problem

Theoretical Analysis of Local Search and Simple Evolutionary Algorithms for the Generalized Travelling Salesperson Problem Theoretical Analysis of Local Search and Siple Evolutionary Algoriths for the Generalized Travelling Salesperson Proble Mojgan Pourhassan ojgan.pourhassan@adelaide.edu.au Optiisation and Logistics, The

More information

A Hybrid Network Architecture for File Transfers

A Hybrid Network Architecture for File Transfers JOURNAL OF IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 6, NO., JANUARY 9 A Hybrid Network Architecture for File Transfers Xiuduan Fang, Meber, IEEE, Malathi Veeraraghavan, Senior Meber,

More information

AN INTEGRATED APPROACH TO MUSIC BOUNDARY DETECTION

AN INTEGRATED APPROACH TO MUSIC BOUNDARY DETECTION 10th International Society for Music Inforation Retrieval Conference (ISMIR 2009) AN INTEGRATED APPROACH TO MUSIC BOUNDARY DETECTION Min-Yian Su, Yi-Hsuan Yang, Yu-Ching Lin, Hoer Chen National Taiwan

More information

A Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage

A Low-Cost Multi-Failure Resilient Replication Scheme for High Data Availability in Cloud Storage 216 IEEE 23rd International Conference on High Perforance Coputing A Low-Cost Multi-Failure Resilient Replication Schee for High Data Availability in Cloud Storage Jinwei Liu* and Haiying Shen *Departent

More information

Improve Peer Cooperation using Social Networks

Improve Peer Cooperation using Social Networks Iprove Peer Cooperation using Social Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 Noveber 5, 2007 Corresponding

More information

Reconstruction of Time Series using Optimal Ordering of ICA Components

Reconstruction of Time Series using Optimal Ordering of ICA Components Reconstruction of Tie Series using Optial Ordering of ICA Coponents Ar Goneid and Abear Kael Departent of Coputer Science & Engineering, The Aerican University in Cairo, Cairo, Egypt e-ail: goneid@aucegypt.edu

More information

Supplementary Section. A. Algorithm. B. Datasets. C. More on Labeling Functions

Supplementary Section. A. Algorithm. B. Datasets. C. More on Labeling Functions Suppleentary Section In this section, we include ore details about our algorith, labeling functions, datasets as well as additional results and coparisons, which could not be included in the ain paper

More information

Image Filter Using with Gaussian Curvature and Total Variation Model

Image Filter Using with Gaussian Curvature and Total Variation Model IJECT Vo l. 7, Is s u e 3, Ju l y - Se p t 016 ISSN : 30-7109 (Online) ISSN : 30-9543 (Print) Iage Using with Gaussian Curvature and Total Variation Model 1 Deepak Kuar Gour, Sanjay Kuar Shara 1, Dept.

More information

Ranking Spatial Data by Quality Preferences

Ranking Spatial Data by Quality Preferences 1 Ranking Spatial Data by Quality Preferences Man Lung Yiu, Hua Lu, Nikos Maoulis, and Michail Vaitis Abstract A spatial preference query ranks objects based on the qualities of features in their spatial

More information

Adaptive Holistic Scheduling for In-Network Sensor Query Processing

Adaptive Holistic Scheduling for In-Network Sensor Query Processing Adaptive Holistic Scheduling for In-Network Sensor Query Processing Hejun Wu and Qiong Luo Departent of Coputer Science and Engineering Hong Kong University of Science & Technology Clear Water Bay, Kowloon,

More information

Multi Packet Reception and Network Coding

Multi Packet Reception and Network Coding The 2010 Military Counications Conference - Unclassified Progra - etworking Protocols and Perforance Track Multi Packet Reception and etwork Coding Aran Rezaee Research Laboratory of Electronics Massachusetts

More information

Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems

Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Design Optiization of Mixed Tie/Event-Triggered Distributed Ebedded Systes Traian Pop, Petru Eles, Zebo Peng Dept. of Coputer and Inforation Science, Linköping University {trapo, petel, zebpe}@ida.liu.se

More information

Scalable search-based image annotation

Scalable search-based image annotation Multiedia Systes DOI 17/s53-8-128-y REGULAR PAPER Scalable search-based iage annotation Changhu Wang Feng Jing Lei Zhang Hong-Jiang Zhang Received: 18 October 27 / Accepted: 15 May 28 Springer-Verlag 28

More information

Shape Optimization of Quad Mesh Elements

Shape Optimization of Quad Mesh Elements Shape Optiization of Quad Mesh Eleents Yufei Li 1, Wenping Wang 1, Ruotian Ling 1, and Changhe Tu 2 1 The University of Hong Kong 2 Shandong University Abstract We study the proble of optiizing the face

More information

POSITION-PATCH BASED FACE HALLUCINATION VIA LOCALITY-CONSTRAINED REPRESENTATION. Junjun Jiang, Ruimin Hu, Zhen Han, Tao Lu, and Kebin Huang

POSITION-PATCH BASED FACE HALLUCINATION VIA LOCALITY-CONSTRAINED REPRESENTATION. Junjun Jiang, Ruimin Hu, Zhen Han, Tao Lu, and Kebin Huang IEEE International Conference on ultiedia and Expo POSITION-PATCH BASED FACE HALLUCINATION VIA LOCALITY-CONSTRAINED REPRESENTATION Junjun Jiang, Ruiin Hu, Zhen Han, Tao Lu, and Kebin Huang National Engineering

More information

Data & Knowledge Engineering

Data & Knowledge Engineering Data & Knowledge Engineering 7 (211) 17 187 Contents lists available at ScienceDirect Data & Knowledge Engineering journal hoepage: www.elsevier.co/locate/datak An approxiate duplicate eliination in RFID

More information

Planet Scale Software Updates

Planet Scale Software Updates Planet Scale Software Updates Christos Gkantsidis 1, Thoas Karagiannis 2, Pablo Rodriguez 1, and Milan Vojnović 1 1 Microsoft Research 2 UC Riverside Cabridge, UK Riverside, CA, USA {chrisgk,pablo,ilanv}@icrosoft.co

More information

A Fast Multi-Objective Genetic Algorithm for Hardware-Software Partitioning In Embedded System Design

A Fast Multi-Objective Genetic Algorithm for Hardware-Software Partitioning In Embedded System Design A Fast Multi-Obective Genetic Algorith for Hardware-Software Partitioning In Ebedded Syste Design 1 M.Jagadeeswari, 2 M.C.Bhuvaneswari 1 Research Scholar, P.S.G College of Technology, Coibatore, India

More information

Approaching the Skyline in Z Order

Approaching the Skyline in Z Order Approaching the Skyline in Z Order KenC.K.Lee Baihua Zheng Huajing Li Wang-Chien Lee cklee@cse.psu.edu bhzheng@su.edu.sg huali@cse.psu.edu wlee@cse.psu.edu Pennsylvania State University, University Park,

More information

Real-Time Detection of Invisible Spreaders

Real-Time Detection of Invisible Spreaders Real-Tie Detection of Invisible Spreaders MyungKeun Yoon Shigang Chen Departent of Coputer & Inforation Science & Engineering University of Florida, Gainesville, FL 3, USA {yoon, sgchen}@cise.ufl.edu Abstract

More information

Relocation of Gateway for Enhanced Timeliness in Wireless Sensor Networks Abstract 1. Introduction

Relocation of Gateway for Enhanced Timeliness in Wireless Sensor Networks Abstract 1. Introduction Relocation of ateway for Enhanced Tieliness in Wireless Sensor Networks Keal Akkaya and Mohaed Younis Departent of oputer Science and Electrical Engineering University of Maryland, altiore ounty altiore,

More information

Massive amounts of high-dimensional data are pervasive in multiple domains,

Massive amounts of high-dimensional data are pervasive in multiple domains, Challenges of Feature Selection for Big Data Analytics Jundong Li and Huan Liu, Arizona State University Massive aounts of high-diensional data are pervasive in ultiple doains, ranging fro social edia,

More information

MiPPS: A Generative Model for Multi-Manifold Clustering

MiPPS: A Generative Model for Multi-Manifold Clustering Manifold Learning and its Applications: Papers fro the AAAI Fall Syposiu (FS-9-) MiPPS: A Generative Model for Multi-Manifold Clustering Oluwasani Koyejo and Joydeep Ghosh Electrical and Coputer Engineering

More information

Smarter Balanced Assessment Consortium Claims, Targets, and Standard Alignment for Math

Smarter Balanced Assessment Consortium Claims, Targets, and Standard Alignment for Math Sarter Balanced Assessent Consortiu s, s, Stard Alignent for Math The Sarter Balanced Assessent Consortiu (SBAC) has created a hierarchy coprised of clais targets that together can be used to ake stateents

More information

INTRINSIC DECOMPOSITION FOR STEREOSCOPIC IMAGES

INTRINSIC DECOMPOSITION FOR STEREOSCOPIC IMAGES INTRINSIC DECOMPOSITION FOR STEREOSCOPIC IMAGES Dehua Xie 1, Shuaicheng Liu 1, Kaio Lin 2, Shuyuan Zhu 1, and Bing Zeng 1 1 School of Electronic Engineering, University of Electronic Science and Technology

More information

An Efficient Approach for Content Delivery in Overlay Networks

An Efficient Approach for Content Delivery in Overlay Networks An Efficient Approach for Content Delivery in Overlay Networks Mohaad Malli, Chadi Barakat, Walid Dabbous Projet Planète, INRIA-Sophia Antipolis, France E-ail:{alli, cbarakat, dabbous}@sophia.inria.fr

More information

Detecting Anomalous Structures by Convolutional Sparse Models

Detecting Anomalous Structures by Convolutional Sparse Models Detecting Anoalous Structures by Convolutional Sparse Models Diego Carrera, Giacoo Boracchi Dipartiento di Elettronica, Inforazione e Bioingegeria Politecnico di Milano, Italy {giacoo.boracchi, diego.carrera}@polii.it

More information

Using Imperialist Competitive Algorithm in Optimization of Nonlinear Multiple Responses

Using Imperialist Competitive Algorithm in Optimization of Nonlinear Multiple Responses International Journal of Industrial Engineering & Production Research Septeber 03, Volue 4, Nuber 3 pp. 9-35 ISSN: 008-4889 http://ijiepr.iust.ac.ir/ Using Iperialist Copetitive Algorith in Optiization

More information

HUMAN pose estimation is an important problem in computer

HUMAN pose estimation is an important problem in computer JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO., JANUARY 7 Robust 3D Huan Pose Estiation fro Single Iages or Video Sequences Chunyu Wang, Student Meber, IEEE, Yizhou Wang, Meber, IEEE, Zhouchen Lin, Meber,

More information

A Comparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Problem

A Comparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Problem IE Vol. 7, No. 2, pp. 84-92, Septeber 2008. A Coparative Study of Two-phase Heuristic Approaches to General Job Shop Scheduling Proble Ji Ung Sun School of Industrial and Managent Engineering Hankuk University

More information

Relief shape inheritance and graphical editor for the landscape design

Relief shape inheritance and graphical editor for the landscape design Relief shape inheritance and graphical editor for the landscape design Egor A. Yusov Vadi E. Turlapov Nizhny Novgorod State University after N. I. Lobachevsky Nizhny Novgorod Russia yusov_egor@ail.ru vadi.turlapov@cs.vk.unn.ru

More information

An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method

An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method An Autoatic Method for Suary Evaluation Using Multiple Evaluation Results by a Manual Method Hidetsugu Nanba Faculty of Inforation Sciences, Hiroshia City University 3-4-1 Ozuka, Hiroshia, 731-3194 Japan

More information

Carving Differential Unit Test Cases from System Test Cases

Carving Differential Unit Test Cases from System Test Cases Carving Differential Unit Test Cases fro Syste Test Cases Sebastian Elbau, Hui Nee Chin, Matthew B. Dwyer, Jonathan Dokulil Departent of Coputer Science and Engineering University of Nebraska - Lincoln

More information

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL 1 Te-Wei Chiang ( 蔣德威 ), 2 Tienwei Tsai ( 蔡殿偉 ), 3 Jeng-Ping Lin ( 林正平 ) 1 Dept. of Accounting Inforation Systes, Chilee Institute

More information

EFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B.

EFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B. EFFICIENT VIDEO SEARCH USING IMAGE QUERIES A. Araujo1, M. Makar2, V. Chandrasekhar3, D. Chen1, S. Tsai1, H. Chen1, R. Angst1 and B. Girod1 1 Stanford University, USA 2 Qualco Inc., USA ABSTRACT We study

More information

Short Papers. Location- and Density-Based Hierarchical Clustering Using Similarity Analysis 1 INTRODUCTION

Short Papers. Location- and Density-Based Hierarchical Clustering Using Similarity Analysis 1 INTRODUCTION IEEE TRASACTIOS O PATTER AALYSIS AD MACHIE ITELLIGECE, VOL. 20, O. 9, SEPTEMBER 1998 1011 Short Papers Location- and Density-Based Hierarchical Clustering Using Siilarity Analysis Peter Bacsy and arendra

More information

Data-driven Hybrid Caching in Hierarchical Edge Cache Networks

Data-driven Hybrid Caching in Hierarchical Edge Cache Networks Data-driven Hybrid Caching in Hierarchical Edge Cache Networks Abstract Hierarchical cache networks are increasingly deployed to facilitate high-throughput and low-latency content delivery to end users.

More information

Querying Uncertain Data with Aggregate Constraints

Querying Uncertain Data with Aggregate Constraints Querying Uncertain Data with Aggregate Constraints Mohan Yang 1,2, Haixun Wang 1, Haiquan Chen 3, Wei-Shinn Ku 3 1 Microsoft Research Asia, Beijing, China 2 Shanghai Jiao Tong University, Shanghai, China

More information

Minimax Sensor Location to Monitor a Piecewise Linear Curve

Minimax Sensor Location to Monitor a Piecewise Linear Curve NSF GRANT #040040 NSF PROGRAM NAME: Operations Research Miniax Sensor Location to Monitor a Piecewise Linear Curve To M. Cavalier The Pennsylvania State University University Par, PA 680 Whitney A. Conner

More information

Derivation of an Analytical Model for Evaluating the Performance of a Multi- Queue Nodes Network Router

Derivation of an Analytical Model for Evaluating the Performance of a Multi- Queue Nodes Network Router Derivation of an Analytical Model for Evaluating the Perforance of a Multi- Queue Nodes Network Router 1 Hussein Al-Bahadili, 1 Jafar Ababneh, and 2 Fadi Thabtah 1 Coputer Inforation Systes Faculty of

More information

Data Caching for Enhancing Anonymity

Data Caching for Enhancing Anonymity Data Caching for Enhancing Anonyity Rajiv Bagai and Bin Tang Departent of Electrical Engineering and Coputer Science Wichita State University Wichita, Kansas 67260 0083, USA Eail: {rajiv.bagai, bin.tang}@wichita.edu

More information

A CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER

A CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER A CRYPTANALYTIC ATTACK ON RC4 STREAM CIPHER VIOLETA TOMAŠEVIĆ, SLOBODAN BOJANIĆ 2 and OCTAVIO NIETO-TALADRIZ 2 The Mihajlo Pupin Institute, Volgina 5, 000 Belgrade, SERBIA AND MONTENEGRO 2 Technical University

More information

Defining and Surveying Wireless Link Virtualization and Wireless Network Virtualization

Defining and Surveying Wireless Link Virtualization and Wireless Network Virtualization 1 Defining and Surveying Wireless Link Virtualization and Wireless Network Virtualization Jonathan van de Belt, Haed Ahadi, and Linda E. Doyle The Centre for Future Networks and Counications - CONNECT,

More information

Scheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms

Scheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL., NO., NOV 27 Scheduling Parallel Real-Tie Recurrent Tasks on Multicore Platfors Risat Pathan, Petros Voudouris, and Per Stenströ Abstract We

More information

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, August 23, 2004, 12:38 PM) PART III: CHAPTER ONE DIFFUSERS FOR CGH S

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, August 23, 2004, 12:38 PM) PART III: CHAPTER ONE DIFFUSERS FOR CGH S COPUTER GEERATED HOLOGRAS Optical Sciences 67 W.J. Dallas (onday, August 3, 004, 1:38 P) PART III: CHAPTER OE DIFFUSERS FOR CGH S Part III: Chapter One Page 1 of 8 Introduction Hologras for display purposes

More information

A Model Free Automatic Tuning Method for a Restricted Structured Controller. by using Simultaneous Perturbation Stochastic Approximation

A Model Free Automatic Tuning Method for a Restricted Structured Controller. by using Simultaneous Perturbation Stochastic Approximation 28 Aerican Control Conference Westin Seattle Hotel, Seattle, Washington, USA June -3, 28 WeC9.5 A Model Free Autoatic Tuning Method for a Restricted Structured Controller by Using Siultaneous Perturbation

More information

An Integrated Processing Method for Multiple Large-scale Point-Clouds Captured from Different Viewpoints

An Integrated Processing Method for Multiple Large-scale Point-Clouds Captured from Different Viewpoints 519 An Integrated Processing Method for Multiple Large-scale Point-Clouds Captured fro Different Viewpoints Yousuke Kawauchi 1, Shin Usuki, Kenjiro T. Miura 3, Hiroshi Masuda 4 and Ichiro Tanaka 5 1 Shizuoka

More information

Optimized stereo reconstruction of free-form space curves based on a nonuniform rational B-spline model

Optimized stereo reconstruction of free-form space curves based on a nonuniform rational B-spline model 1746 J. Opt. Soc. A. A/ Vol. 22, No. 9/ Septeber 2005 Y. J. Xiao and Y. F. Li Optiized stereo reconstruction of free-for space curves based on a nonunifor rational B-spline odel Yi Jun Xiao Departent of

More information

2013 IEEE Conference on Computer Vision and Pattern Recognition. Compressed Hashing

2013 IEEE Conference on Computer Vision and Pattern Recognition. Compressed Hashing 203 IEEE Conference on Coputer Vision and Pattern Recognition Copressed Hashing Yue Lin Rong Jin Deng Cai Shuicheng Yan Xuelong Li State Key Lab of CAD&CG, College of Coputer Science, Zhejiang University,

More information

Fair Resource Allocation for Heterogeneous Tasks

Fair Resource Allocation for Heterogeneous Tasks Fair Resource Allocation for Heterogeneous Tasks Koyel Mukherjee, Partha utta, Gurulingesh Raravi, Thangaraj Balasubraania, Koustuv asgupta, Atul Singh Xerox Research Center India, Bangalore, India 560105

More information

Galois Homomorphic Fractal Approach for the Recognition of Emotion

Galois Homomorphic Fractal Approach for the Recognition of Emotion Galois Hooorphic Fractal Approach for the Recognition of Eotion T. G. Grace Elizabeth Rani 1, G. Jayalalitha 1 Research Scholar, Bharathiar University, India, Associate Professor, Departent of Matheatics,

More information

BP-MAC: A High Reliable Backoff Preamble MAC Protocol for Wireless Sensor Networks

BP-MAC: A High Reliable Backoff Preamble MAC Protocol for Wireless Sensor Networks BP-MAC: A High Reliable Backoff Preable MAC Protocol for Wireless Sensor Networks Alexander Klein* Jirka Klaue Josef Schalk EADS Innovation Works, Gerany *Eail: klein@inforatik.uni-wuerzburg.de ABSTRACT:

More information

A Network-based Seamless Handover Scheme for Multi-homed Devices

A Network-based Seamless Handover Scheme for Multi-homed Devices A Network-based Sealess Handover Schee for Multi-hoed Devices Md. Shohrab Hossain and Mohaed Atiquzzaan School of Coputer Science, University of Oklahoa, Noran, OK 7319 Eail: {shohrab, atiq}@ou.edu Abstract

More information

Joint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points

Joint Measurement- and Traffic Descriptor-based Admission Control at Real-Time Traffic Aggregation Points Joint Measureent- and Traffic Descriptor-based Adission Control at Real-Tie Traffic Aggregation Points Stylianos Georgoulas, Panos Triintzios and George Pavlou Centre for Counication Systes Research, University

More information