Re-engineering Software Variants into Software Product Line

Similar documents
Mining Features from the Object-Oriented Source Code of a Collection of Software Variants Using Formal Concept Analysis and Latent Semantic Indexing

Mining Features from the Object-Oriented Source Code of Software Variants by Combining Lexical and Structural Similarity

Where Should the Bugs Be Fixed?

TopicViewer: Evaluating Remodularizations Using Semantic Clustering

A Tree Kernel Based Approach for Clone Detection

Usage Scenarios for Feature Model Synthesis.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Chapter 6: Information Retrieval and Web Search. An introduction

Part I: Data Mining Foundations

Mining Reusable Software Components from Object-Oriented Source Code of a Set of Similar Software

Content-based Dimensionality Reduction for Recommender Systems

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

Program Abstractions, Language Paradigms. CS152. Chris Pollett. Aug. 27, 2008.

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Social Behavior Prediction Through Reality Mining

Visualization and text mining of patent and non-patent data

TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines

CONCEPTUAL DESIGN FOR SOFTWARE PRODUCTS: SERVICE REQUEST PORTAL. Tyler Munger Subhas Desa

Software Features Extraction from Object-Oriented Source Code Using an Overlapping Clustering Approach

A Text Retrieval Approach to Recover Links among s and Source Code Classes

Java Archives Search Engine Using Byte Code as Information Source

Improving perception of intersecting 2D scalar fields. Mark Robinson Advisor: Dr. Kay Robbins

METHOD CHUNK FEDERATION ISABELLE MIRBEL EMMSAD 2006

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu

A Heuristic-based Approach to Identify Concepts in Execution Traces

Software Design Fundamentals. CSCE Lecture 11-09/27/2016

Reading part: Design-Space Exploration with Alloy

A Category-Theoretic Approach to Syntactic Software Merging

Similarities in Source Codes

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23

Part I Basic Concepts 1

Feature Modeling. Krzysztof Czarnecki & Simon Helsen University of Waterloo

A Modular Approach to Document Indexing and Semantic Search

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

Ra Fat AL-msie deen Department of Information Technology, Mutah University, Al-Karak, Jordan

Integrated Impact Analysis for Managing Software Changes. Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk

Extracting Executable Architecture from Legacy Code using Static Reverse Engineering. REHMAN ARSHAD The University of Manchester, UK

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

User Intent Discovery using Analysis of Browsing History

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su

FCA for Software Product Lines Representation: Mixing Configuration and Feature Relationships in a Unique Canonical Representation

MSc(IT) Program. MSc(IT) Program Educational Objectives (PEO):

Bibliometrics: Citation Analysis

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Semantic text features from small world graphs

Using Formal Concept Analysis for discovering knowledge patterns

Inheritance Usage Patterns in Open-Source Systems. Jamie Stevenson and Murray Wood. University of Strathclyde, Glasgow, UK

ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS. By Michelle Dowling

Domain-specific Concept-based Information Retrieval System

An Overview of Techniques for Detecting Software Variability Concepts in Source Code

Patent documents usecases with MyIntelliPatent. Alberto Ciaramella IntelliSemantic 25/11/2012

Graphical Notation for Topic Maps (GTM)

DR. JIVRAJ MEHTA INSTITUTE OF TECHNOLOGY

Grundlagen methodischen Arbeitens Informationsvisualisierung [WS ] Monika Lanzenberger

Web Service Matchmaking Using Web Search Engine and Machine Learning

GRAF: Graph-based Runtime Adaptation Framework

Introduction to Lexical Analysis

Event Detection through Differential Pattern Mining in Internet of Things

code pattern analysis of object-oriented programming languages

Software Clone Detection. Kevin Tang Mar. 29, 2012

Reverse Engineering Feature Models From Software Variants to Build Software Product Lines REVPLINE Approach

Exam IST 441 Spring 2011

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

Enabling Flexibility in Process-Aware

Key Properties for Comparing Modeling Languages and Tools: Usability, Completeness and Scalability

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

Data and Process Modelling

Software Engineering

Search Engines. Information Retrieval in Practice

Non-negative Matrix Factorization for Multimodal Image Retrieval

Most keyword search tools would place a capacitive touch panel further down the results list.

The Goal of this Document. Where to Start?

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES

Week. Lecture Topic day (including assignment/test) 1 st 1 st Introduction to Module 1 st. Practical

A Model-Based Development Method for Device Drivers

Principles of Computer Game Design and Implementation. Revision Lecture

DEPARTMENT OF COMPUTER SCIENCE

Knowledge Engineering in Search Engines

On partial order semantics for SAT/SMT-based symbolic encodings of weak memory concurrency

Pre-Requisites: CS2510. NU Core Designations: AD

Multidimensional Indexing The R Tree


Information Retrieval: Retrieval Models

ADMINISTRATIVE MANAGEMENT COLLEGE

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

SAS Visual Analytics 8.2: Working with Report Content

Chapter 6. Advanced Data Modeling. Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

LABORATORY 1 REVISION

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Finding the Missing Eclipse Perspective: the Runtime Perspective. Andrew Giang Marwan Abi-Antoun Department of Computer Science

Introduction to Parsing. Lecture 5

Variability representation in product lines using concept lattices: feasibility study with descriptions from Wikipedia s product comparison matrices

Lecture 2 and 3: Fundamental Object-Oriented Concepts Kenneth M. Anderson

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Software Synthesis from Dataflow Models for G and LabVIEW

ADVANCED SOFTWARE DESIGN LECTURE 4 SOFTWARE ARCHITECTURE

Clustering Techniques

Transcription:

Re-engineering Software Variants into Software Product Line Présentation extraite de la soutenance de thèse de M. Ra'Fat AL-Msie'Deen University of Montpellier

Software product variants 1. Software product variants Are similar software o o o Share mandatory features Differ in optional features Developed via clone-and-own approach Examples o Linux kernel «https://www.kernel.org/» o Mobile media «http://www.ic.unicamp.br/~tizzei/mobilemedia/» o ArgoUML «http://argouml-spl.tigris.org/» 2

Software product Line 2. Software product Line Software-intensive systems come in many variants Motivations: o o Reduce cost and time of software development reuse, etc. Software Product Line Engineering Feature Model F1 FM F0 F2 Product Configuration e.g., Source code Domain Engineering Application Engineering 3

Software product Line Instance-1 Instance-2 4

Problem ü Software product variants Difficulties for: o Reuse... Feature (mandatory, optional)?... Feature Name and description?... Feature dependencies (feature model)? o Maintenance o Program understanding (comprehension) ü Software Product Line Design from scratch is a hard task 5

Goal ü Reverse engineering FM from the source code of software product variants Strategy Feature model mining (reverse engineering step): ü Mining functional features ü Documenting mined feature implementations ü Mining feature dependencies (require, exclude, group of features: xor, or,and) 6

Process Extracting Features Feature Implementations Software Variants Source Code 1 Product-by-Feature Matrix Feature Names Documenting Features 2 Identifying cross-tree constraints Identifying groups of features (and, or and xor) Building the Feature Model 3 7

Proposal Input Output Software-1 REVPLINE Feature-1 Software-2 Feature-2 Software_A Software_B Implementation Space I 1 I 2 I 1 I3 Software-N Feature-N Implementation Space Feature Space Feature Space F 1 F2 F 1 F 3 8

Contribution We exploit commonality and variability across the source code of software variants, to apply IR methods in an efficient way We rely on lexical and structural similarity to mine feature implementation Variability at different levels of source code elements The REVPLINE feature location approach uses two techniques: FCA and LSI 9

Used Technique Formal Concept Analysis (FCA) ü «objects + attributes classified concepts» The Formal Context and AOC-poset for Text Editor software Variants 10

Used Technique Latent Semantic Indexing (LSI) ü IR technique ü Computes textual similarity among different documents ü If two documents share a large number of terms, those documents are similar LSI 11

Key Ideas A Feature has the same implementation in all product variants where it is present Feature are implemented as OBEs: package, class, attribute, method, etc. Junction = overlap of feature implementations software-1 software-2 software-1 Search Space All OBEs OBEs-1 OBEs-3 OBEs-4 Structural Link Lexical Link Fi-1 Fi-2 12

OBE to Feature Mapping Model 13

Lexical Versus Structural Similarity Between OBEs Lexical Similarity Query o LSI method Document Structural Similarity o We consider five dependencies "coupling between OBEs: inheritance, method invocation, composition, attribute access and combined coupling 14

Process Implementation Space P1 P2 P n Static Analysis OBEs Commonalities and variabilities computation 1 2 Legend: Process Output 1 2 (Block of Variation) N Common Block Block Software variants Variable OBEs Variable OBEs Variable OBE Common OBE n Common OBE n Common OBE 3 Feature Space Feature Atomic Block Clustering Similarity Matrix Lexical similarity computation Feature Atomic Block Clustering The Combined Matrix Lexical and structural similarity computation Feature implementation 5 4 15

Example An Illustrative Example: Drawing Shapes Software Variants Features Software 16

Example A formal context describing drawing shapes software variants by their OBEs OBEs Software 17

Example Common block Blocks of variation The AOC-poset 18

Measuring OBEs Similarity Based on Lexical Similarity 1 3 2 5 4 19

Measuring OBEs Similarity Based on Lexical & Structural Similarity 2 1 3 4 5 20

Outline Documenting the Mined Feature Implementation 21

Proposal???? Mined Feature Implementation Feature Documentation Inputs Output Usecase Feature Name Description REVPLINE Use-case Diagrams Mined Feature Implementation- N Feature Description 22

State of the Art 1. Single software system = labels / names / topics / code summarization 2. Software variants = manually assign feature names to mined feature implementations ü Feature documentation = giving a name / description for the mined feature implementation ü The mined feature implementation must be documented o For the purpose of constructing a FM 23

Contribution We exploit commonality and variability across software variants, at feature implementation and use-cases levels, to apply IR methods in an efficient way Our approach gives each feature implementation a name and description based on the usecase name and description Feature documentation = Names of the OBE when use-cases are missing The REVPLINE documentation approach uses three techniques: FCA, LSI and RCA 24

25 Context Problem Goal Process Feature Location Feature Documentation Reverse Engineering FM Experimentation Threats to Validity Conclusion Future Directions Used Technique Relational Concept Analysis (RCA) ü «Objects (in categories) + attributes + relations classified concepts in several categories» Relational Context Family Concept Lattice Family

Key Ideas In our work, each use-case represents a feature All feature implementations All use-cases search space HB-1 HB-3 HB-4 Use-case Feature implementation Lexical link FD-1 FD-2 26

Process Input The Feature Documentation Process View map Description To RCA Tables 1 RCA Object - Attribute Tables U Use-case diagrams P P P X Concept Lattice Family Filtering (Hybrid block) i 2 Use-case Diagrams public.. ( ) {.... } Feature Implementations U Feature implementations F F X P P P X X Object - Object Table F F F U X X public.. ( ) {.... } LSI View map Lexical Similarity Computation 3 Output U X X Cosine similarity matrix Feature Name FCA 4 Documents F1 F2 Clustering Queries U1 0.08 0.80 U2 0.75 0.02 Feature Description 27 Feature Documentation Legend Process Technique Output

Example Mobile Tourist Guide (MTG) Software Variants Use-cases Mined Feature Implementations Software Software The use-case diagrams of the second and fourth MTG software variants 28

Example 29

Example The concept lattice family of relational context family 30

Example Exploring and filtering the hybrid blocks CLF to identify features documentation 31

Example Hybrid Block_i public viewmap(){ int a = 0; while (a > 5) { if (a!= 20) { } else { a = 30; }}} View Map Documents (Feature Implementations) Queries (use-cases) View Map Implementation View Map viewmap a The tourist can view maps via mobile device... 32

Example 1 2 3 4 33

Example Features Software 34

Using Identifier Names Naming Feature Implementation Based on OBE Names: 1. Extracting and tokenizing OBE names from the identified feature implementation 2. Weighting tokens 3. Constructing the feature name ü The proposed name = StreetShowView 35

Reverse Engineering Feature Models from Software Configurations 36

Proposal Reverse Engineering Reverse Engineering FMs Feature implementation and documentation Synthesis Inputs Output Source code Software Configurations Feature model variant 1 variant 2 F-1 F-2 F-3 S-1 x x S-2 x x S-3 x x Product-by-feature matrix REVPLINE 37

Contribution o o Automatic approach to organize the mined and documented features into a FM Features are organized in a tree which highlights ü ü ü ü Mandatory features Optional features Feature groups (and, or, xor groups) cross-tree constraints: require and exclude constraints o o We rely on FCA and software configurations The FMs are generated in very short time 38

Key Ideas All Feature The common feature set Mandatory Features The optional feature set Optional Features Identify all kinds of constraints between the optional features Group of features Cross-tree constraints 39

Process FM Reverse Engineering Process The Common Feature Set Product-by-Feature matrix 2 Extracting Root Feature 3 Extracting Mandatory Features 1 Analysis Formal Context The AOC-poset The Optional Feature Set Synthesize FM 4 Extracting Atomic Sets of Features 5 Extracting Inclusive-Or 6 Extracting Exclusive-Or 7 Extracting Require Constraints 8 Extracting Exclude Constraints 40

Example Legend: Mandatory Cell_Phone Optional Or Alternative Wireless Accu_Cell Display Games Infrared Bluetooth Strong Medium Weak Multi_Player Single_Player excludes requires requires Existing Cell phone SPL FM Artificial_Opponent 41

Example 42

Example The top concept The root feature The Base feature 43

Example The AND feature 44

Example The XOR feature Minimum concepts 45

Example The OR feature 46

Example 47

Example The OR feature 48

The mined FM Cell_Phone Base AND XOR OR Accu_ Artificial_ Single_ Multi_ Display Games Strong Medium Weak Wireless Bluetooth Infrared Cell Opponent Player Player 49

FM Evaluation «In our approach, feature selection constraints are not detected» 50

Experimentation and Threats to Validity 51

Experimentation ü ArgoUML-SPL = real SPL, 10 products, large systems, Java, well documented ü Health complaint-spl = real SPL, 10 products, medium systems, Java, well documented ü Mobile Media = real software variants, 4 products, small systems, Java, well documented Ø Evaluation Metrics: precision, recall and F-Measure 52

ArgoUML SPL ArgoUML screenshot 53

Health Complaint SPL Health Complaint screenshot 54

Mobile Media software product variants Mobile Media screenshots 55

Feature Location o Results show that the precision metric appears high o Results show that the recall metric appears so high o We cannot use a fixed number of topics for LSI 120 100 80 60 40 ArgoUML-SPL Health complaint-spl Mobile Media 20 0 Precision Recall F-Measure Average of evaluation metrics for feature location 56

Feature Location ü The lexical and structural similarity approach gives better results than the lexical approach alone Part of the sequence diagram feature implementation 57

Feature Documentation o Results show that recall value in all cases is 100% o Results show that precision value either 100% or 50% o Number of topics for LSI = # of feature implementations 120 100 80 60 40 ArgoUML-SPL Health complaint-spl Mobile Media 20 0 Precision Recall F-Measure Average of evaluation metrics for Feature Documentation 58

Feature Documentation Examples (Feature Name and Description) ü Use-case diagram: "a use-case is a set of scenarios that describes an interaction between a user and a system. A use-case diagram displays the relationship among actors and use-cases. The two main components of a use-case diagram are use-cases and actors" ü View sorted photos: "the device sorts the photos based on the number of times photo has been viewed" ü Specify food complaint: "this use case allows a citizen to register a food complaint. The food complaint has the following information: food complaint data, description and observations" 59

FM Reverse Engineering o o Considering the recall metric, it is value is 100% for all case studies Results show that precision appears to be not very high for all case studies 120 100 80 60 40 ArgoUML-SPL Health complaint-spl Mobile Media 20 0 Precision Recall F-Measure Evaluation metrics for FM Reverse Engineering 60

FM Reverse Engineering 3 1 2 61

Reverse Engineering FMs from Samples of Program Configurations 62

Threats to validity 1. Lexical similarity 2. We consider junction as feature implementations 3. Dynamic analysis techniques 4. There is a limitation of using FCA as clustering technique 5. Each use-case represents a single feature 6. The mined FM defines more configurations than the initial FM 63