Developer Recommendation for Crowdsourced Software Development Tasks

Size: px

Start display at page:

Download "Developer Recommendation for Crowdsourced Software Development Tasks"

Claribel Jemima Hood
6 years ago
Views:

1 Developer Recommendation for Crowdsourced Software Development Tasks CREST Centre University College London March 30, 2015 San Francisco, USA

2 OUTLINE INTRODUCTION Motivation Problem BACKGROUND TopCoder Recommendation METHODOLOGY Framework Features EVALUATION RQs Setting Results Developer Recommendation for Crowdsourced Software Development Tasks

3 MOTIVATION Developer s Perspective: Information overload. Figure 1: Available tasks listed on TopCoder on Dec. 12, 2014.

4 MOTIVATION Platform s Perspective: The limitation of pull-based model.

5 PROBLEM Historical Task Register History Participants Winner 1. Suitable developers for participation? 2. Reliable developer for delivering qualified assets?... Win History Recommend developers for participation New Task Participants Winner?? Recommend developers for diverging qualified assets Figure 2: Developer recommendation utilising historical data.

6 TOPCODER Established in 2001 Largest community for CSD Over 770,000 developers

7 TOPCODER Figure 3: TopCoder case studies

8 TOPCODER PROCESS Figure 4: Crowdsourced software development process and its task phases (derived from TopCoder.com and Mao et al.

9 RECOMMENDER SYSTEM Content-based filtering Collaborative filtering Hybrid

10 CRWODREX: FRAMEWORK Recommend Developers for Delivering Qualified Assets Recommend Developers for Participation Win History Register History Data Filtering 1. Data Filtering 2. Feature Extraction 3. Learner Training 4. Learner Application Winners Learner(WL) WL Developer Distribution Feature Extraction New Task Feature Extraction Data Transform Participants Learner(PL) PL Developer Distribution Rank Developers for Delivering Qualified Assets Recommended Developers Rank Developers for Participation Recommended Developers Figure 5: The framework of CrowdRex

11 CRWODREX: DATA FILTERING Recommend Developers for Delivering Qualified Assets Recommend Developers for Participation Win History Register History Data Filtering 1. Data Filtering 2. Feature Extraction 3. Learner Training 4. Learner Application Winners Learner(WL) WL Developer Distribution Feature Extraction New Task Feature Extraction Data Transform Participants Learner(PL) PL Developer Distribution Rank Developers for Delivering Qualified Assets Recommended Developers Rank Developers for Participation Recommended Developers Figure 6: The framework of CrowdRex

12 CRWODREX: FEATURE EXTRACTION Recommend Developers for Delivering Qualified Assets Recommend Developers for Participation Win History Register History Data Filtering 1. Data Filtering 2. Feature Extraction 3. Learner Training 4. Learner Application Winners Learner(WL) WL Developer Distribution Feature Extraction New Task Feature Extraction Data Transform Participants Learner(PL) PL Developer Distribution Rank Developers for Delivering Qualified Assets Recommended Developers Rank Developers for Participation Recommended Developers Figure 7: The framework of CrowdRex

13 FEATURES

14 FEATURES

15 FEATURES

16 FEATURES

17 FEATURES

18 FEATURES

19 FEATURES

20 FEATURES Table 1: The content features for crowdsourced software tasks Feature Format Description Date Numeric Post date of the task. PL Text What Programming Language is used. Title Text Title of the posted task. Tech Text Indicate what techniques are used. Description Text Task descriptions overview. Duration Numeric Time allocated to the task. Payment Numeric How much US dollars will the winner get.

21 FEATURES Feature Table 2: The adopted feature distance measures Distance Measure Date (Date i Date j )/Date MaxDiff PL PL i == PL j? 1 : 0 Title Tech Description Duration Payment Tit x Tit y Tit x Tit y Match(Tech i, Tech j )/NumberOfTechs Max Des x Des y Des x Des y (Duration i Duration j )/Duration Max (Payment i Payment j )/Payment Max NLP: 1. Tokenization; 2. Stop words removal; 3. Term vector model (in TF-IDF): 4. Cosine similarity; (TF IDF) w i,j = ( 1 + log(tf j ) ) log T df i (1)

22 CRWODREX: LEARNER TRAINING Recommend Developers for Delivering Qualified Assets Recommend Developers for Participation Win History Register History Data Filtering 1. Data Filtering 2. Feature Extraction 3. Learner Training 4. Learner Application Winners Learner(WL) WL Developer Distribution Feature Extraction New Task Feature Extraction Data Transform Participants Learner(PL) PL Developer Distribution Rank Developers for Delivering Qualified Assets Recommended Developers Rank Developers for Participation Recommended Developers Figure 8: The framework of CrowdRex

23 CRWODREX: LEARNER APPLICATION Recommend Developers for Delivering Qualified Assets Recommend Developers for Participation Win History Register History Data Filtering 1. Data Filtering 2. Feature Extraction 3. Learner Training 4. Learner Application Winners Learner(WL) WL Developer Distribution Feature Extraction New Task Feature Extraction Data Transform Participants Learner(PL) PL Developer Distribution Rank Developers for Delivering Qualified Assets Recommended Developers Rank Developers for Participation Recommended Developers Figure 9: The framework of CrowdRex

24 RESEARCH QUESTIONS RQ1. Baseline Comparison Active baseline: Top N based on statistical performance / activeness RQ2. Performance Assessment Best learner Accuracy and Diversity RQ3. Insights

25 RESEARCH QUESTIONS RQ1. Baseline Comparison Active baseline: Top N based on statistical performance / activeness RQ2. Performance Assessment Best learner Accuracy and Diversity RQ3. Insights

26 RESEARCH QUESTIONS RQ1. Baseline Comparison Active baseline: Top N based on statistical performance / activeness RQ2. Performance Assessment Best learner Accuracy and Diversity RQ3. Insights

27 DATASET 2 datasets from TopCoder.com Oct to Mar ,094 historical tasks Table 3: Statistics of the evaluation datasets Dataset # Tasks # Reg. # Win. Duration Development 1093/ / / Assembly 1505/ / /

28 EXPERIMENTAL SETTING Split each dataset into 10 folds Recommend 5, 10, 20 developers Train C4.5, NaiveBayes, KNN 1, KNN 5 learners 7 6 D e v e lo p m e n t A s s e m b ly N u m b e r o f s u b m is s io n s N u m b e r o f re g is tra n ts Figure 10: The relationship between the number of registrants and submissions.

29 EVALUATION METRICS Accuracy Acc i = 1 T ( ( correct Ri (t) ) / Ri (t) ) (2) t T Diversity / Div i = R i (t) Actual (t) t T r t T (3)

30 RESULTS Table 4: Accuracy and diversity of developer recommendation for delivering qualified assets DS DEV ASM Rec. C4.5 NaïveBayes KNN 1 KNN 5 Active Acc Div Acc Div Acc Div Acc Div Acc Div 5 50% 40% 44% 21% 34% 27% 32% 35% 37% 5% 10 60% 45% 54% 26% 44% 30% 49% 38% 46% 11% 20 71% 52% 67% 18% 56% 40% 61% 42% 57% 22% 5 37% 72% 33% 33% 38% 47% 43% 73% 15% 6% 10 51% 74% 44% 18% 49% 49% 58% 76% 26% 12% 20 63% 77% 54% 48% 57% 53% 67% 78% 37% 23% Table 5: Accuracy and diversity of developer recommendation for participation DS DEV ASM Rec. C4.5 NaïveBayes KNN 1 KNN 5 Active Acc Div Acc Div Acc Div Acc Div Acc Div 5 30% 3% 4% 15% 3% 8% 3% 7% 24% 1% 10 23% 6% 2% 17% 5% 14% 6% 13% 18% 1% 20 21% 11% 1% 18% 6% 20% 6% 19% 12% 2% 5 69% 1% 12% 10% 10% 35% 10% 34% 65% 1% 10 48% 2% 6% 11% 16% 47% 17% 46% 44% 2% 20 30% 4% 3% 12% 17% 53% 18% 53% 25% 4%

31 RESULTS Answer to RQ1: Better than the baseline method for most cases Table 6: Accuracy and diversity of developer recommendation for delivering qualified assets DS DEV ASM Rec. C4.5 NaïveBayes KNN 1 KNN 5 Active Acc Div Acc Div Acc Div Acc Div Acc Div 5 50% 40% 44% 21% 34% 27% 32% 35% 37% 5% 10 60% 45% 54% 26% 44% 30% 49% 38% 46% 11% 20 71% 52% 67% 18% 56% 40% 61% 42% 57% 22% 5 37% 72% 33% 33% 38% 47% 43% 73% 15% 6% 10 51% 74% 44% 18% 49% 49% 58% 76% 26% 12% 20 63% 77% 54% 48% 57% 53% 67% 78% 37% 23% Table 7: Accuracy and diversity of developer recommendation for participation DS DEV ASM Rec. C4.5 NaïveBayes KNN 1 KNN 5 Active Acc Div Acc Div Acc Div Acc Div Acc Div 5 30% 3% 4% 15% 3% 8% 3% 7% 24% 1% 10 23% 6% 2% 17% 5% 14% 6% 13% 18% 1% 20 21% 11% 1% 18% 6% 20% 6% 19% 12% 2% 5 69% 1% 12% 10% 10% 35% 10% 34% 65% 1% 10 48% 2% 6% 11% 16% 47% 17% 46% 44% 2% 20 30% 4% 3% 12% 17% 53% 18% 53% 25% 4%

32 RESULTS Answer to RQ2: Accuracy - C4.5 (9/12), Diversity - KNN 1 (4/12) Table 8: Accuracy and diversity of developer recommendation for delivering qualified assets DS DEV ASM Rec. C4.5 NaïveBayes KNN 1 KNN 5 Active Acc Div Acc Div Acc Div Acc Div Acc Div 5 50% 40% 44% 21% 34% 27% 32% 35% 37% 5% 10 60% 45% 54% 26% 44% 30% 49% 38% 46% 11% 20 71% 52% 67% 18% 56% 40% 61% 42% 57% 22% 5 37% 72% 33% 33% 38% 47% 43% 73% 15% 6% 10 51% 74% 44% 18% 49% 49% 58% 76% 26% 12% 20 63% 77% 54% 48% 57% 53% 67% 78% 37% 23% Table 9: Accuracy and diversity of developer recommendation for participation DS DEV ASM Rec. C4.5 NaïveBayes KNN 1 KNN 5 Active Acc Div Acc Div Acc Div Acc Div Acc Div 5 30% 3% 4% 15% 3% 8% 3% 7% 24% 1% 10 23% 6% 2% 17% 5% 14% 6% 13% 18% 1% 20 21% 11% 1% 18% 6% 20% 6% 19% 12% 2% 5 69% 1% 12% 10% 10% 35% 10% 34% 65% 1% 10 48% 2% 6% 11% 16% 47% 17% 46% 44% 2% 20 30% 4% 3% 12% 17% 53% 18% 53% 25% 4%

33 RESULTS: COMPARISON For Delivering Qualified Assets For Par9cipa9on 80% 70% 60% 50% 40% 30% 20% 10% 0% 35% 30% 25% 20% 15% 10% Development C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Development 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 80% 70% 60% 50% 40% 30% 20% Assembly C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Assembly Top5 (Acc) Top10 (Acc) Top20 (Acc) Top5 (Div) Top10 (Div) Top20 (Div) 5% 10% 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Figure 11: Performance comparison of developer recommendation when recommending 5, 10 and 20 developers. The x-axis shows the machine learners and the baseline method Active. The y-axis shows the value for Accuracy (Acc) and Diversity (Div) measures. The scatter points are linked for the purpose of improving the readability

34 RESULTS: COMPARISON Answer to RQ3: 1. Selection: no-free-lunch 2. Trade-off: Accuracy-Diversity dilemma 3. Action: Active with low coverage For Delivering Qualified Assets For Par9cipa9on 80% 70% 60% 50% 40% 30% 20% 10% 0% 35% 30% 25% 20% 15% 10% Development C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Development 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 80% 70% 60% 50% 40% 30% 20% Assembly C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Assembly Top5 (Acc) Top10 (Acc) Top20 (Acc) Top5 (Div) Top10 (Div) Top20 (Div) 5% 10% 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Figure 12: Performance comparison

35 RESULTS: COMPARISON Answer to RQ3: 1. Selection: no-free-lunch 2. Trade-off: Accuracy-Diversity dilemma 3. Action: Active with low coverage For Delivering Qualified Assets For Par9cipa9on 80% 70% 60% 50% 40% 30% 20% 10% 0% 35% 30% 25% 20% 15% 10% Development C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Development 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 80% 70% 60% 50% 40% 30% 20% Assembly C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Assembly Top5 (Acc) Top10 (Acc) Top20 (Acc) Top5 (Div) Top10 (Div) Top20 (Div) 5% 10% 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Figure 12: Performance comparison

36 RESULTS: COMPARISON Answer to RQ3: 1. Selection: no-free-lunch 2. Trade-off: Accuracy-Diversity dilemma 3. Action: Active with low coverage For Delivering Qualified Assets For Par9cipa9on 80% 70% 60% 50% 40% 30% 20% 10% 0% 35% 30% 25% 20% 15% 10% Development C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Development 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 80% 70% 60% 50% 40% 30% 20% Assembly C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Assembly Top5 (Acc) Top10 (Acc) Top20 (Acc) Top5 (Div) Top10 (Div) Top20 (Div) 5% 10% 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve 0% C4.5 NaiveBayse KNN_1 KNN_5 Ac:ve Figure 12: Performance comparison

37 SUMMARY Motivation A dauntingly large set of task options Inappropriate developer-task matching may harm the quality Aim Automatically match tasks and developers Method CrowdRex: Content-based recommendation Results Evaluated 4 machine learners on 3,094 historical tasks Accuracy 50%-71% and diversity 40%-52%

Semantic Estimation for Texts in Software Engineering

Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University