Extension, Abbreviation and Refinement -Identifying High-Level Dependence Structures Using Slice-Based Dependence Analysis

Similar documents
Identifying High-Level Dependence Structures Using Slice-Based Dependence Analysis. by Zheng Li

Evaluating Key Statements Analysis

Part I: Preliminaries 24

An Empirical Study of the Relationship Between. the Concepts Expressed in Source Code and. Dependence

By B.A.Khivsara Assistant Professor Department of Computer Enginnering SNJB s KBJ COE,Chandwad

Algorithm Analysis Graph algorithm. Chung-Ang University, Jaesung Lee

Algorithms, Games, and Networks February 21, Lecture 12

AmbitXT v2.1.0 Manual

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC

Lesson 1: Complementary and Supplementary Angles

Introduction to OSPF

PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance

A Performance Evaluation Architecture for Hierarchical PNNI and Performance Evaluation of Different Aggregation Algorithms in Large ATM Networks

= a hypertext system which is accessible via internet

ECE 428 Internet Protocols (Network Layer: Layer 3)

Internet Measurements. Motivation

Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking. A Writing Project. Presented to. The Faculty of the Department of Computer Science

Shortest Path Approximation on Triangulated Meshes. By Annie Luan

An Empirical Study of Executable Concept Slice Size

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Fairness Example: high priority for nearby stations Optimality Efficiency overhead

Lecture (08, 09) Routing in Switched Networks

CSE 461 Routing. Routing. Focus: Distance-vector and link-state Shortest path routing Key properties of schemes

Inheritance Metrics: What do they Measure?

An Approach for Mapping Features to Code Based on Static and Dynamic Analysis

ROTATION SCHEDULING ON SYNCHRONOUS DATA FLOW GRAPHS. A Thesis Presented to The Graduate Faculty of The University of Akron

On the use of Fuzzy Logic Controllers to Comply with Virtualized Application Demands in the Cloud

Solid Modeling. Thomas Funkhouser Princeton University C0S 426, Fall Represent solid interiors of objects

Where have all the cars gone? A model for determining traffic flow throughout a road network

LightSlice: Matrix Slice Sampling for the Many-Lights Problem

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Large Scale Information Visualization. Jing Yang Fall Tree and Graph Visualization (2)

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

Message-passing algorithms (continued)

Computation of Multiple Node Disjoint Paths

Tag Switching. Background. Tag-Switching Architecture. Forwarding Component CHAPTER

Building an Internet-Scale Publish/Subscribe System

Introduction to OSPF OSPF. Link State Routing. Link State. Fast Convergence. Low Bandwidth Utilisation

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

The Relationship between Slices and Module Cohesion

Impact of Dependency Graph in Software Testing

IMPACT OF DEPENDENCY GRAPH IN SOFTWARE TESTING

Turbo King: Framework for Large- Scale Internet Delay Measurements

Value-Based Software Test Prioritization

Course Routing Classification Properties Routing Protocols 1/39

Scalable Data-driven PageRank: Algorithms, System Issues, and Lessons Learned

Real World Graph Efficiency

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Ex-Ray: Detection of History-Leaking Browser Extensions

Conditioned Slicing. David Jong-hoon An. May 23, Abstract

Classification of Log Files with Limited Labeled Data

Qualifying Exam in Programming Languages and Compilers

Border Gateway Protocol

Wei Gao, Qinghua Li, Bo Zhao and Guohong Cao The Pennsylvania State University From: MobiHoc Presentor: Mingyuan Yan Advisor: Dr.

A BGP-Based Mechanism for Lowest-Cost Routing

Introduction to OSPF

Buildings s Tab

Inverted Index for Fast Nearest Neighbour

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2

Surface Simplification Using Quadric Error Metrics

Random projection for non-gaussian mixture models

Existing Model Metrics and Relations to Model Quality

Back to basics J. Addressing is the key! Application (HTTP, DNS, FTP) Application (HTTP, DNS, FTP) Transport. Transport (TCP/UDP) Internet (IPv4/IPv6)

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART

DETERMINE COHESION AND COUPLING FOR CLASS DIAGRAM THROUGH SLICING TECHNIQUES

CS6200 Information Retreival. The WebGraph. July 13, 2015

Exercise Unit 2: Modeling Paradigms - RT-UML. UML: The Unified Modeling Language. Statecharts. RT-UML in AnyLogic

Request for Comments: T. Li August 1996

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Mail Assure. Quick Start Guide

Pregel. Ali Shah

IPv4/IPv6 Smooth Migration (IVI) Xing Li etc

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

slicing An example of slicing Consider the following example:

Introduction to OSPF

Simple and Scalable Handoff Prioritization in Wireless Mobile Networks

A Scalable QoS Device for Broadband Access to Multimedia Services

Structured System Theory

Investigation of Metrics for Object-Oriented Design Logical Stability

Design and Implementation of an Anycast Efficient QoS Routing on OSPFv3

Geometric Modeling. Mesh Decimation. Mesh Decimation. Applications. Copyright 2010 Gotsman, Pauly Page 1. Oversampled 3D scan data

Interactive Refinement of Hierarchical Object Graphs

Introduction to digital image classification

TA: Jade Cheng ICS 241 Recitation Lecture Note #9 October 23, 2009

Center for Networked Computing

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

MapReduce Patterns. MCSN - N. Tonellotto - Distributed Enabling Platforms

Auto-calibration. Computer Vision II CSE 252B

NAT Internetworking Standards and Technologies, Computer Engineering, KMITL 2

SELECTION OF METRICS (CONT) Gaia Maselli

Compact Call Center Reporter

Link Structure Analysis

Link Analysis. CSE 454 Advanced Internet Systems University of Washington. 1/26/12 16:36 1 Copyright D.S.Weld

Performance Evaluation of Scheduling Mechanisms for Broadband Networks

Data Mining Algorithms In R/Clustering/K-Means

Introduction to Computer Science

1 Introduction. 1.1 Raster-to-vector conversion

Transcription:

Extension, Abbreviation and Refinement -Identifying High-Level Dependence Structures Using Slice-Based Dependence Analysis Zheng Li CREST, King s College London, UK

Overview Motivation Three combination techniques Extension Abbreviation Refinement

Many analysis techniques for program comprehension have been proposed Domain knowledge high-level Source code low-level Pattern recognition Concept assignment Data-flow analysis Dependence analysis

Advantages and Disadvantages High-level Low-level Accuracy Low High Scalability Yes No Human Knowledge Yes No

If combine the two? High-level techniques can provide a reasonable analysis scope with domain knowledge for low-level analysis techniques, then avoiding the scalability problem of lowlevel techniques. Low-level techniques can improve the accuracy of high-level techniques.

In this thesis Concept Assignment Program Slicing

Concept Assignment First defined in 1993 and aimed at comprehension tasks allocate specific high-level meaning to specific parts of a program Hypothesis-Based Concept Assignment (HB- CA) Existing implementation Uses domain and program semantics Good quality assignments

Program Slicing which other lines affect the selected line? we only care about this line

Concept Assignment Program Slicing Contiguous? Executable? High/low level?

Combination 1: Extension Concept Slice Using program slicing to extend a concept binding by tracing its dependencies Algorithm Using concepts as slicing criteria, the concept slice is the union of slices for each program point in the concept

Combination 2: Abbreviation Extract key statements within concept bindings Less is More! The statements that capture most impact with highest cohesion help to focus attention more rapidly on the core of a concept binding Algorithm Intersection of slices with respect to principal variables within a concept binding

r D=2*r; perimeter=pi*d; undersurface=pi*r*r; sidesurface=perimeter*h; area=2*undersurface+sidesurface; volume=undersurface*h; printf( \nthe Area is %d\n", area ); printf( \nthe Volume is %d\n", volume ); h

The Results so far The concept slice has no size explosion. The identified key statements have high Impact and Cohesion, but some concept bindings do not contain key statements.

Combination 3: Refinement A more accurate dependence based concept binding by removing non-concept-dependent statements

r D=2*r; perimeter=pi*d; undersurface=pi*r*r; sidesurface=perimeter*h; area=2*undersurface+sidesurface; volume=undersurface*h; printf( \nthe Area is %d\n", area); printf( \nthe Volume is %d\n", volume); h

Program Chopping Given source S and target T, what program points transmit effects from S to T? S T

Vertex Rank Model Google s Page Rank Model Dependence is transitive the weight of a vertex will be distributed following the outgoing edges and inherited through incoming edges.

Weight of Nodes sum of all node weights = 1 weight of node represents the importance of dependence of a vertex

Weights of Edges d=1/4 0.05 0.2 0.2 A d: distribution ratio d=1/4 d=1/4 d=1/4 0.05 0.05 0.05 0.05 0.15 B 0.4 Node weight is distributed to each outgoing edge Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight sum of all incoming edge weights = destination node weight

Definition of Weights ) ( ) ( ) ( 2 1 v n w v w v w ) ( ) ( ) ( 2 1 v n w v w v w t d d d d d d d d d nn n n n n 2 1 2 22 21 1 12 11 =. D t : transposed matrix of distribution ratios W: node weight vector

Propagating Weights 0.34 0.17 0.33 A B 0.17 0.33 0.33 C 0.33

Propagating Weights 0.33 0.175 0.17 A B 0.175 0.5 0.5 C 0.17

Propagating Weights 0.5 0.25 0.175 A B 0.25 0.345 0.345 C 0.175

Propagating Weights 0.4 0.2 0.2 A B 0.2 0.4 0.4 C 0.2 Stable weight assignment next-step weights are the same as previous ones

Pseudo Use Relation A B C Weight computation does not always converge Add a pseudo edge from a node to another, if there is no 'real' edge Distribution ratios: pseudo edges << real edges

Empirical Study Tools WeSCA and CodeSurfer 10 Subject programs Open source and industry code More than 600 concept bindings are extracted Dependence based metrics are defined Statistical analysis

Size reduction

Impact

Cohesion

Summary The combination of approaches can be fully automated and implemented. Concept refinement is better than concept extension and concept abbreviation.

Questions?