Christoph Treude. Bimodal Software Documentation

Size: px

Start display at page:

Download "Christoph Treude. Bimodal Software Documentation"

Britton McDonald
5 years ago
Views:

1 Christoph Treude Bimodal Software Documentation

2 Software Documentation [1985] 2

3 Software Documentation is everywhere [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 3

4 Software Documentation is everywhere 100% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 4

5 Software Documentation is everywhere 100% 74% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 5

6 Software Documentation is everywhere 100% 74% 59% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 6

7 Software Documentation is everywhere 100% 59% 74% 44% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 7

8 Software Documentation is everywhere 100% 59% 74% 44% 37% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 8

9 Software Documentation is everywhere 100% 59% 74% 44% 37% 162 different domains in top 10 for 99 queries [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 9

10 Software Documentation is everywhere Tensorflow Python API: 309 different domains in top 10 for 2,192 queries 100% 59% 36% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 10

11 Software Documentation is everywhere Tensorflow Python API: 309 different domains in top 10 for 2,192 queries 100% 59% 36% jquery Event API: 75 different domains in top 10 for 57 queries 100% 100% 98% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 11

12 Navigating documentation is not trivial 12

13 Navigating documentation is not trivial Common Tasks Link Link Link Link Link Link Link Link 13

14 Extracting tasks from documentation verb noun adjective [C Treude, M P Robillard, and B Dagenais Extracting Development Tasks to Navigate Software Documentation IEEE Trans on Software Engineering, 41, 6, p ] 14

15 Grammatical dependencies direct object: generate confirmation direct object: generate receipt [C Treude, M P Robillard, and B Dagenais Extracting Development Tasks to Navigate Software Documentation IEEE Trans on Software Engineering, 41, 6, p ] 15

16 Grammatical dependencies passive nominal subject: set size [C Treude, M P Robillard, and B Dagenais Extracting Development Tasks to Navigate Software Documentation IEEE Trans on Software Engineering, 41, 6, p ] 16

17 Grammatical dependencies passive nominal subject: set size adjective modifier: set thumbnail size [C Treude, M P Robillard, and B Dagenais Extracting Development Tasks to Navigate Software Documentation IEEE Trans on Software Engineering, 41, 6, p ] 17

18 Grammatical dependencies passive nominal subject: set size preposition: set thumbnail size in templates adjective modifier: set thumbnail size [C Treude, M P Robillard, and B Dagenais Extracting Development Tasks to Navigate Software Documentation IEEE Trans on Software Engineering, 41, 6, p ] 18

19 [C Treude, M Sicard, M Klocke, and M P Robillard TaskNav: Task-based Navigation of Software Documentation ICSE 15: 37th Int l Conf on Software Engineering, p ] 19

20 Software Documentation is everywhere 100% 59% 74% 44% 37% [C Parnin and C Treude Measuring API Documentation on Web Web2SE 11: 2nd Int l Workshop on Web 20 for Software Engineering, p 25-30] 20

21 [C Treude and M P Robillard Augmenting API Documentation with Insights from Stack Overflow ICSE 16: 38th Int l Conference on Software Engineering, p ] 21

22 insight sentence a sentence from Stack Overflow that is related to a particular API type and that provides insight not contained in API documentation of that type

23 Supervised Insight Sentence Extractor Augment API documentation with insights from Stack Overflow 23

24 Bimodal software documentation [B A Campbell and C Treude NLP2Code: Code Snippet Content Assist via Natural Language Tasks ICSME 17: 33rd Int l Conf on Software Maintenance and Evolution, to appear] 24

content that has not authored by a native speaker [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software

25 Challenges in Analyzing Documentation Software documentation is technical and often contains references to code elements Natural language text written by software developers may not obey all grammatical rules, eg, sentences that are grammatically incomplete content that has not authored by a native speaker [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 25

26 Comparing NLP libraries CoreNLP SyntaxNet spacy NLTK [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 26

27 Comparing NLP libraries C + + CoreNLP SyntaxNet spacy NLTK [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 27

28 Comparing NLP libraries C different tokenization CoreNLP SyntaxNet spacy NLTK [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 28

29 Comparing NLP libraries CoreNLP SyntaxNet spacy NLTK NNS C + + DT NN JJ CC JJ VBZ DT NNP NN VBZ DT NNP NN NNS DT NN JJ 1 different tokenization [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 29

30 Comparing NLP libraries CoreNLP SyntaxNet spacy NLTK NNS C + + DT NN JJ CC JJ VBZ DT NNP NN VBZ DT NNP NN NNS DT NN JJ 1 different tokenization 2 general part of speech [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 30

31 Comparing NLP libraries CoreNLP SyntaxNet spacy NLTK NNS C + + DT NN JJ CC JJ VBZ DT NNP NN VBZ DT NNP NN NNS DT NN JJ 1 different tokenization 2 general part of speech 3 specific part of speech [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 31

32 Comparing NLP libraries CoreNLP SyntaxNet spacy NLTK C + + Only between 60% and NNS DT NN JJ CC JJ 71% of tokens from Overflow, Stack GitHub, VBZ DT NNP NN and Java API Documentation were VBZ DT NNP NN assigned same part-of-speech tag by all NNS four DT libraries NN JJ 1 different tokenization 2 general part of speech 3 specific part of speech [F N A Al Omran and C Treude Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments MSR '17: 14th Int l Conf on Mining Software Repositories, p ] 32

33 Bimodal software documentation [B A Campbell and C Treude NLP2Code: Code Snippet Content Assist via Natural Language Tasks ICSME 17: 33rd Int l Conf on Software Maintenance and Evolution, to appear] 33

34 Bimodal software documentation tasks code [B A Campbell and C Treude NLP2Code: Code Snippet Content Assist via Natural Language Tasks ICSME 17: 33rd Int l Conf on Software Maintenance and Evolution, to appear] 34

35 Code Snippet Content Assist tasks code [B A Campbell and C Treude NLP2Code: Code Snippet Content Assist via Natural Language Tasks ICSME 17: 33rd Int l Conf on Software Maintenance and Evolution, to appear] 35

36 [B A Campbell and C Treude NLP2Code: Code Snippet Content Assist via Natural Language Tasks ICSME 17: 33rd Int l Conf on Software Maintenance and Evolution, to appear] 36

37 The integration of natural language and code in documentation 37

38 The integration of natural language and code in documentation C + + CoreNLP SyntaxNet spacy NLTK creates challenges & opportunities for software engineering tools 38

39 The integration of natural language and code in documentation C + + CoreNLP SyntaxNet spacy NLTK Thank you! christophtreude@adelaideeduau creates challenges & opportunities for software engineering tools 39

NLTK Server Documentation

NLTK Server Documentation Release 1 Preetham MS January 31, 2017 Contents 1 Documentation 3 1.1 Installation................................................ 3 1.2 API Documentation...........................................