Natural Language Processing Is No Free Lunch

Size: px

Start display at page:

Download "Natural Language Processing Is No Free Lunch"

Aubrey Lang
6 years ago
Views:

1 Natural Language Processing Is No Free Lunch STEFAN WAGNER UNIVERSITY OF STUTTGART, STUTTGART, GERMANY ntroduction o Impressive progress in NLP: OS with personal assistants like Siri or Cortan o Brief check on how and how not to apply NLP in software analytics. o Study case: NLP applied over the documentation of software systems. o Most of the documentation, although structured and versatile (JavaDoc and Doxygen), focuses on the level of functions/methods and classes, while the component level is often missing. Why don t use the former to generate the latter? NL Data in Software Projects How to apply NLP: 2 techniques o There a lot of NL in a software project: o Textual documentation for the user or the software architecture. o Commit messages and issue descriptions. o Comments in the source code. o Nowadays there s a wide range of algorithms that allows processing large NL datasets. o Part-of-speech tagging: Returns the grammatical use of each word (verb, noun, or determiner) o Topic modeling: Extracts the most probable topics in a NL dataset. o Stemming Process of removing morphological nd inflexional endings from words. Read and reading à they both orrespond to the same word. o Lemmatization It employs dictionary and morphological analysis to return the base form of a word (called lemma Better has lemma good.(stemmin would miss it).

2 NLP is no magic o You have to ty alternative algorithms and tune them during the analysis. o The results will depend strongly on the quality of the analyzed NL data: o The analysis of an official Java library, which is well documented, will work well, while o The analysis of open source code (which has fewer comments) will provide less useful results. o Don t discard manual analysis o Humans can make connections bas on their own knowledge and experience, and they can formulate results easily accessible to other humans. o You can use manual feedback combined with NLP techniques. o Systematic manual analysis: It requires a simple coding of the textu data, where a tag or code (a word, a sentence, or a whole paragraph) is attached to the piece of NL analyzed Can Clone Detection Support Quality Assessments of Requirements Specifications? Elmar Juergens, Florian Deissenboeck, rtin Feilkas, Benjamin Hummel, rnhard Schaetz, Stefan Wagner Technische Universität München Garching b. München, Germany Christoph Domann, Jonathan Streit itestra GmbH Garching b. München, Germany ntroduction o Software requirements specifications (SRS) are the keystone of most software projects. o They influence the product s quality and the effort spent on development. o They are the key (and often only) communication artifact between customer and contractor. o SRS are mostly written in NL à Few techniques for automated quality assessment. o It s possible to use clone detection to tackle redundancy. o Clone detection is commonly applied to find duplications in source code (cloning), which can. o Increase a project's size and the effort required for size-related activities. o Lead to errors, caused by inconsistent changes. Research Problem o 4 objectives: o Do real-world requirements specifications contain duplicated informatio o What kind of information is duplicated? o Which consequences the duplication of information has on the different software development activities. o Can existing clone detection approaches be applied in practice to identi duplication in SRS automatically? o 28 SRS analyzed à a total of 8,667 pages.

Terminology o Requirements specification (RS): a specification for a particular software product, program, or set of programs that performs certain functions in a specific environment.

3 Terminology o Requirements specification (RS): a specification for a particular software product, program, or set of programs that performs certain functions in a specific environment. o A RS can be interpreted as a single sequence of words. o A normalized RS is obtained when its set of words is transformed by grouping sets of similar words. o A specification clone is a substring of the normalized specification with a certain minimal length, appearing at least twice. o A clone group contains all clones of a specification that have the same content. o Clones of a relevant clone group must convey similar information and this information must refer to the system described à system interaction steps. o Clone coverage denotes the part of a specification that is covered by cloning. represents the probability that an arbitra chosen specification sentence is cloned least once. o Number of clone groups and clones denotes how many different logical specification fragments have been copie and how often they occur. o Blow-up describes how much larger the specification is compared to a hypotheti specification version that contains no clo Methodology: Study Definition o 4 research questions : o RQ1. How much cloning do real-world requirements specifications contain? o RQ2. What kind of information is cloned in requirements specifications? o RQ3. What consequences does cloning in requirements specifications have? o RQ4. Can cloning in requirements specifications be detected accurately using existing clone detectors? o Content analysis of study objects specifications from industrial projects performed with clone detection and manually. Methodology: Study Design. First, RS are assigned randomly to pairs of researchers for further analysis.. Clone detection is performed on all documents of a specification.. The researcher pairs perform clone detection tailoring for each specification.. They manually inspect detected clones for false positives, adding filters to remove the appearance of these false positives. o The sequence 2, 3, 4 is repeated until no false positives are found in a random sample of the detected clone groups. Methodology: Study Design o For each specification, a random sample of clone groups is analyzed, base on the kind of information they contain. The clone is assigned to all suitable categories. o On selected specifications, content analysis of the source code is performe The code corresponding to specification clones is studied in order to classi whether the specification cloning resulted in code cloning, duplicated functionality without cloning, or was resolved through the creation of a shared abstraction.

Methodology: Study Objects o The 28 RS are from various domains: administration, automotive, convenience, finance, telecommunication, and transportation.

o Siemens AG: the largest engineering company in Europe. o MOST Cooperation: a partnership of car manufacturers (including Audi, BMW and Daimler) and component suppliers.

4 Methodology: Study Objects o The 28 RS are from various domains: administration, automotive, convenience, finance, telecommunication, and transportation. o The RS were obtained from different organizations, including: o Munich Re Group: one of the largest reinsurance companies in the world and employs more than 47,000 people in over 50 locations. o Siemens AG: the largest engineering company in Europe. o MOST Cooperation: a partnership of car manufacturers (including Audi, BMW and Daimler) and component suppliers. Methodology: Study Implementation and Execution o RQ1: The tool ConQAT is used to perform clone detection and to compute the clone measures. Detection is performed with a minimal clone length of 20 words. o RQ2: If more than 20 clone groups are found for a specification, the manua classification is performed on a random sample of 20 clone groups. Else, a clone groups for a specification are inspected. During inspection, 8 categories were added and 1 was changed. o RQ3: Relative blow-up is computed as the ratio of the total number of word to the number of redundancy-free words. Absolute blow-up is computed as the difference of total and redundancy free number of words. Methodology: Study Implementation and Execution o An average reading speed of 220 words per minute was used to calculate the additional effort for reading, while for the inspection task the metric corresponded to 600 words per hour. o RQ4: Precision is determined by measuring the percentage of the relevant clones in the inspected sample. Clone detection tailoring is performed by creating regular expressions that match the false positives. A maximum number of 20 randomly chosen clone groups is inspected in each tailoring step. The Clone Detection Tool ut and Pre-Processing The input phase reads the documents and produce a normalized word stream (using the Porter stemmer algorithm). It requires all the input data to be plain text. After reading the text contents of a specification, certain sections of the documents are excluded. The resulting text is split into single words; whitespace and punctuation is discarded. ection This phase extracts all substrings in the word stream that are sufficiently long and occur at least twice. The algorithm works by constructing a suffix tree from the token (word) stream. Each branch of the tree which reaches at least two leaves corresponds to a clone. st-processing and Output During post-processing, all clone groups which contain overlapping clones are removed. The output phase calculates several metrics on the clones.

Results: RQ1 Amount of Cloning Results: RQ2 Cloned Information o Clone group cardi the number of tim specification fragme been cloned. 1. Detailed Use Case Steps 2. Reference 3. UI 4.

Rationale Results: RQ3 Consequences of CloningResults: RQ3 Consequences of Cloning ecification Reading o The average blow-up of the analyzed SRS is 3,578 words which, at typical reading speed of 220

5 Results: RQ1 Amount of Cloning Results: RQ2 Cloned Information o Clone group cardi the number of tim specification fragme been cloned. 1. Detailed Use Case Steps 2. Reference 3. UI 4. Domain Knowledge 5. Interface Description 6. Pre-Condition 7. Side-Condition 8. Configuration 9. Feature 10. Technical Domain Knowledge 11. Post-Condition 12. Rationale Results: RQ3 Consequences of CloningResults: RQ3 Consequences of Cloning ecification Reading o The average blow-up of the analyzed SRS is 3,578 words which, at typical reading speed of 220 words per minute translates to additional 16 minutes. o This amount increases to 6 hours for the inspection task. ecification Modification o The comments documented during the inspection of the sampled clones were analyzed (for each specification set). They refer to duplicated specification fragments that are essentially longer than the clones detected by the tool. Specification Implementation For the inspected 20 specification clone groups and their source codes 3 different effects were found: 1. The redundancy in the requirements is not reflected in the code. It contains shared abstractions that avoid duplication. 2. The code that implements a cloned piece of an SRS is also cloned. In this case, future changes to the cloned code cause additional efforts as modification must be reflected in all clones. Furthermore, changes to cloned code are errorprone as inconsistencies may be introduced accidentally. 3. Code of the same functionality has been implemented multiple times. This kind of redundancy is harder to detect as existing clone detection approaches cannot find code that is functionally similar but not the result of copy&paste

Results: RQ4 Detection Tailoring and Accuracy e false positives contain information from e following categories Document meta data comprises information about he creation process of the document.

6 Results: RQ4 Detection Tailoring and Accuracy e false positives contain information from e following categories Document meta data comprises information about he creation process of the document. ndexes do not add new information and are ypically generated automatically by text processors. Page decorations are typically automatically nserted by text processors. Open issues document gaps in the specification..e. TODO statements. Specification template information contains section names common to all individual documents hat are part of a specification. o By using clone detection tailoring precisio were above 85%, with an average of 99% o The time required for tailoring varies betw and 33 minutes across specifications, wit average value of 10 minutes. Conclusions and Future Work o The amount of cloning encountered is significant. However, as shown with the broad spectrum of findings, cloning in SRS can be successfully avoided o Cloning is not confined to a specific kind of information. o The most obvious effect of duplication is the increased size, which could be avoided by cross references or different organization of the specifications. Another consequence is the increase on the time spent reading the RS. o Redundancy may lead to inconsistent changes of the clones, which my induce errors in the RS and thus in the final system. o Specification cloning can lead to cloned or re-implemented parts of code. Conclusions and Future Work o Excising clone detection approaches can be applied to identify cloned information in SRS. However, a certain amount of analysis tailoring is required to increase detection precision. o Without any pervious knowledge, one must assume that the probability thao Subjectivity during the categorization of the cloned information à an arbitrary sentence in the specification is duplicated is greater than 10%. Researches in pairs and inter-rater agreement. o One should make SRS authors and reviewers aware of the problems that SRS cloning has and avoid redundancy from the beginning on. Threats to Validity: Internal Validity o Results influenced due to mistakes or individual preferences of researches during the tailoring phase à Clone tailoring in pairs. o Precision was determined on random samples. o Inaccurate calculation of additional effort due to blow-up. o Cloned and non-cloned text treated uniformly with respect to reading efforts o Few interest in detection recall. o Sometimes duplication is employed intentionally in order to make a part of SRS self-contained. In this case, you have to make sure that the duplicatedo No research of false negatives, the amount of duplication contained in a part is maintained only once and that readers can recognize the duplication specification and not identified by the automated detector.

Threats to Validity: External ValidityQuestions and Discussion o The practice of requirements engineering differs stronglo How could we detect document fragments that convey between different

7 Threats to Validity: External ValidityQuestions and Discussion o The practice of requirements engineering differs stronglo How could we detect document fragments that convey between different domains, companies, and even similar information but are different on the word level? projects defaulting the generalization of the results. o What qualitatively effects could be used during the content analysis of the source code? o Can be the results generalized to any domain, any company and any kind of software system? Thanks!

Management. Software Quality. Dr. Stefan Wagner Technische Universität München. Garching 28 May 2010

Management. Software Quality. Dr. Stefan Wagner Technische Universität München. Garching 28 May 2010 Technische Universität München Software Quality Management Dr. Stefan Wagner Technische Universität München Garching 28 May 2010 Some of these slides were adapted from the tutorial "Clone Detection in