Re-engineering Software Variants into Software Product Line Présentation extraite de la soutenance de thèse de M. Ra'Fat AL-Msie'Deen University of Montpellier
Software product variants 1. Software product variants Are similar software o o o Share mandatory features Differ in optional features Developed via clone-and-own approach Examples o Linux kernel «https://www.kernel.org/» o Mobile media «http://www.ic.unicamp.br/~tizzei/mobilemedia/» o ArgoUML «http://argouml-spl.tigris.org/» 2
Software product Line 2. Software product Line Software-intensive systems come in many variants Motivations: o o Reduce cost and time of software development reuse, etc. Software Product Line Engineering Feature Model F1 FM F0 F2 Product Configuration e.g., Source code Domain Engineering Application Engineering 3
Software product Line Instance-1 Instance-2 4
Problem ü Software product variants Difficulties for: o Reuse... Feature (mandatory, optional)?... Feature Name and description?... Feature dependencies (feature model)? o Maintenance o Program understanding (comprehension) ü Software Product Line Design from scratch is a hard task 5
Goal ü Reverse engineering FM from the source code of software product variants Strategy Feature model mining (reverse engineering step): ü Mining functional features ü Documenting mined feature implementations ü Mining feature dependencies (require, exclude, group of features: xor, or,and) 6
Process Extracting Features Feature Implementations Software Variants Source Code 1 Product-by-Feature Matrix Feature Names Documenting Features 2 Identifying cross-tree constraints Identifying groups of features (and, or and xor) Building the Feature Model 3 7
Proposal Input Output Software-1 REVPLINE Feature-1 Software-2 Feature-2 Software_A Software_B Implementation Space I 1 I 2 I 1 I3 Software-N Feature-N Implementation Space Feature Space Feature Space F 1 F2 F 1 F 3 8
Contribution We exploit commonality and variability across the source code of software variants, to apply IR methods in an efficient way We rely on lexical and structural similarity to mine feature implementation Variability at different levels of source code elements The REVPLINE feature location approach uses two techniques: FCA and LSI 9
Used Technique Formal Concept Analysis (FCA) ü «objects + attributes classified concepts» The Formal Context and AOC-poset for Text Editor software Variants 10
Used Technique Latent Semantic Indexing (LSI) ü IR technique ü Computes textual similarity among different documents ü If two documents share a large number of terms, those documents are similar LSI 11
Key Ideas A Feature has the same implementation in all product variants where it is present Feature are implemented as OBEs: package, class, attribute, method, etc. Junction = overlap of feature implementations software-1 software-2 software-1 Search Space All OBEs OBEs-1 OBEs-3 OBEs-4 Structural Link Lexical Link Fi-1 Fi-2 12
OBE to Feature Mapping Model 13
Lexical Versus Structural Similarity Between OBEs Lexical Similarity Query o LSI method Document Structural Similarity o We consider five dependencies "coupling between OBEs: inheritance, method invocation, composition, attribute access and combined coupling 14
Process Implementation Space P1 P2 P n Static Analysis OBEs Commonalities and variabilities computation 1 2 Legend: Process Output 1 2 (Block of Variation) N Common Block Block Software variants Variable OBEs Variable OBEs Variable OBE Common OBE n Common OBE n Common OBE 3 Feature Space Feature Atomic Block Clustering Similarity Matrix Lexical similarity computation Feature Atomic Block Clustering The Combined Matrix Lexical and structural similarity computation Feature implementation 5 4 15
Example An Illustrative Example: Drawing Shapes Software Variants Features Software 16
Example A formal context describing drawing shapes software variants by their OBEs OBEs Software 17
Example Common block Blocks of variation The AOC-poset 18
Measuring OBEs Similarity Based on Lexical Similarity 1 3 2 5 4 19
Measuring OBEs Similarity Based on Lexical & Structural Similarity 2 1 3 4 5 20
Outline Documenting the Mined Feature Implementation 21
Proposal???? Mined Feature Implementation Feature Documentation Inputs Output Usecase Feature Name Description REVPLINE Use-case Diagrams Mined Feature Implementation- N Feature Description 22
State of the Art 1. Single software system = labels / names / topics / code summarization 2. Software variants = manually assign feature names to mined feature implementations ü Feature documentation = giving a name / description for the mined feature implementation ü The mined feature implementation must be documented o For the purpose of constructing a FM 23
Contribution We exploit commonality and variability across software variants, at feature implementation and use-cases levels, to apply IR methods in an efficient way Our approach gives each feature implementation a name and description based on the usecase name and description Feature documentation = Names of the OBE when use-cases are missing The REVPLINE documentation approach uses three techniques: FCA, LSI and RCA 24
25 Context Problem Goal Process Feature Location Feature Documentation Reverse Engineering FM Experimentation Threats to Validity Conclusion Future Directions Used Technique Relational Concept Analysis (RCA) ü «Objects (in categories) + attributes + relations classified concepts in several categories» Relational Context Family Concept Lattice Family
Key Ideas In our work, each use-case represents a feature All feature implementations All use-cases search space HB-1 HB-3 HB-4 Use-case Feature implementation Lexical link FD-1 FD-2 26
Process Input The Feature Documentation Process View map Description To RCA Tables 1 RCA Object - Attribute Tables U Use-case diagrams P P P X Concept Lattice Family Filtering (Hybrid block) i 2 Use-case Diagrams public.. ( ) {.... } Feature Implementations U Feature implementations F F X P P P X X Object - Object Table F F F U X X public.. ( ) {.... } LSI View map Lexical Similarity Computation 3 Output U X X Cosine similarity matrix Feature Name FCA 4 Documents F1 F2 Clustering Queries U1 0.08 0.80 U2 0.75 0.02 Feature Description 27 Feature Documentation Legend Process Technique Output
Example Mobile Tourist Guide (MTG) Software Variants Use-cases Mined Feature Implementations Software Software The use-case diagrams of the second and fourth MTG software variants 28
Example 29
Example The concept lattice family of relational context family 30
Example Exploring and filtering the hybrid blocks CLF to identify features documentation 31
Example Hybrid Block_i public viewmap(){ int a = 0; while (a > 5) { if (a!= 20) { } else { a = 30; }}} View Map Documents (Feature Implementations) Queries (use-cases) View Map Implementation View Map viewmap a The tourist can view maps via mobile device... 32
Example 1 2 3 4 33
Example Features Software 34
Using Identifier Names Naming Feature Implementation Based on OBE Names: 1. Extracting and tokenizing OBE names from the identified feature implementation 2. Weighting tokens 3. Constructing the feature name ü The proposed name = StreetShowView 35
Reverse Engineering Feature Models from Software Configurations 36
Proposal Reverse Engineering Reverse Engineering FMs Feature implementation and documentation Synthesis Inputs Output Source code Software Configurations Feature model variant 1 variant 2 F-1 F-2 F-3 S-1 x x S-2 x x S-3 x x Product-by-feature matrix REVPLINE 37
Contribution o o Automatic approach to organize the mined and documented features into a FM Features are organized in a tree which highlights ü ü ü ü Mandatory features Optional features Feature groups (and, or, xor groups) cross-tree constraints: require and exclude constraints o o We rely on FCA and software configurations The FMs are generated in very short time 38
Key Ideas All Feature The common feature set Mandatory Features The optional feature set Optional Features Identify all kinds of constraints between the optional features Group of features Cross-tree constraints 39
Process FM Reverse Engineering Process The Common Feature Set Product-by-Feature matrix 2 Extracting Root Feature 3 Extracting Mandatory Features 1 Analysis Formal Context The AOC-poset The Optional Feature Set Synthesize FM 4 Extracting Atomic Sets of Features 5 Extracting Inclusive-Or 6 Extracting Exclusive-Or 7 Extracting Require Constraints 8 Extracting Exclude Constraints 40
Example Legend: Mandatory Cell_Phone Optional Or Alternative Wireless Accu_Cell Display Games Infrared Bluetooth Strong Medium Weak Multi_Player Single_Player excludes requires requires Existing Cell phone SPL FM Artificial_Opponent 41
Example 42
Example The top concept The root feature The Base feature 43
Example The AND feature 44
Example The XOR feature Minimum concepts 45
Example The OR feature 46
Example 47
Example The OR feature 48
The mined FM Cell_Phone Base AND XOR OR Accu_ Artificial_ Single_ Multi_ Display Games Strong Medium Weak Wireless Bluetooth Infrared Cell Opponent Player Player 49
FM Evaluation «In our approach, feature selection constraints are not detected» 50
Experimentation and Threats to Validity 51
Experimentation ü ArgoUML-SPL = real SPL, 10 products, large systems, Java, well documented ü Health complaint-spl = real SPL, 10 products, medium systems, Java, well documented ü Mobile Media = real software variants, 4 products, small systems, Java, well documented Ø Evaluation Metrics: precision, recall and F-Measure 52
ArgoUML SPL ArgoUML screenshot 53
Health Complaint SPL Health Complaint screenshot 54
Mobile Media software product variants Mobile Media screenshots 55
Feature Location o Results show that the precision metric appears high o Results show that the recall metric appears so high o We cannot use a fixed number of topics for LSI 120 100 80 60 40 ArgoUML-SPL Health complaint-spl Mobile Media 20 0 Precision Recall F-Measure Average of evaluation metrics for feature location 56
Feature Location ü The lexical and structural similarity approach gives better results than the lexical approach alone Part of the sequence diagram feature implementation 57
Feature Documentation o Results show that recall value in all cases is 100% o Results show that precision value either 100% or 50% o Number of topics for LSI = # of feature implementations 120 100 80 60 40 ArgoUML-SPL Health complaint-spl Mobile Media 20 0 Precision Recall F-Measure Average of evaluation metrics for Feature Documentation 58
Feature Documentation Examples (Feature Name and Description) ü Use-case diagram: "a use-case is a set of scenarios that describes an interaction between a user and a system. A use-case diagram displays the relationship among actors and use-cases. The two main components of a use-case diagram are use-cases and actors" ü View sorted photos: "the device sorts the photos based on the number of times photo has been viewed" ü Specify food complaint: "this use case allows a citizen to register a food complaint. The food complaint has the following information: food complaint data, description and observations" 59
FM Reverse Engineering o o Considering the recall metric, it is value is 100% for all case studies Results show that precision appears to be not very high for all case studies 120 100 80 60 40 ArgoUML-SPL Health complaint-spl Mobile Media 20 0 Precision Recall F-Measure Evaluation metrics for FM Reverse Engineering 60
FM Reverse Engineering 3 1 2 61
Reverse Engineering FMs from Samples of Program Configurations 62
Threats to validity 1. Lexical similarity 2. We consider junction as feature implementations 3. Dynamic analysis techniques 4. There is a limitation of using FCA as clustering technique 5. Each use-case represents a single feature 6. The mined FM defines more configurations than the initial FM 63