Improving Origin Analysis with Weighting Functions
|
|
- Cori Banks
- 6 years ago
- Views:
Transcription
1 Improving Origin Analysis with Weighting Functions Lin Yang, Anwar Haque and Xin Zhan Supervisor: Michael Godfrey University of Waterloo Introduction Software systems must undergo modifications to improve its readability, simplifies its structure or in response to changes in user requests and software environment [5]. These activities involve renaming, moving, splitting and merging source code entities. Consequently, many entities that appear new in the later release are actually transformed from old entities. There exists several previous works [2, 3, 4] on this subject, which is termed Origin Analysis. Godfrey et al. [3, 4] proposed algorithms to find entity rename, split and merge over different releases by analyzing the similarity of call relations as well as various attributes of the program entities. In their approach, each caller function or callee function are treated with equal importance. However, some functions carry more weight than others as illustrated in the next paragraph. Consider three functions A, B and C with 4, 2 and 2 callees respectively as shown in Figure. Traditionally call relation matcher treats each callee equally, thus the similarity of A and B will be calculated as the same as that of A and C. However, with more information of the callee functions available, a better decision is possible. For example, if F and F2 are standard library functions while F3 and F4 are functions that are defined in the same file as the caller, then A and C are more likely to be matching functions than B and C. Another case is if F and F2 are called 00 times in the system while F3 and F4 are called only 5 times, then A and C are more likely to be matching functions than B and C. In order to capture such differences, each caller and callee function should be assigned a suitable weight, rather than treated equally. A B C F F2 F3 F4 F F2 F3 F4 Figure : An example of call relations. In this paper, we have designed two weighting schemes: hierarchy based and frequency based. We have also proposed an automatic approach for doing origin analysis based on Machine Learning techniques. Unlike traditional work, our approach does not require human input in picking the weights and thresholds. The case study we conducted on Ctags demonstrates the effectiveness of our approach.
2 Our contributions are as follows: We have developed two weighting schemes to measure call relation similarity: frequency based and hierarchy based, Design and implementation of a system that is flexible enough to be used as a platform of doing function based Origin Analysis. We have carried a case study on Ctags. Experiment result shows our approach achieves higher accuracy than unweighted call relationship analysis. We provide a mathematically proven methodology of comparing the usefulness of attributes in Origin Analysis. Establish an automatic function match identification framework based on decision tree learning. The remainder of this report is organized as follows: Section 2 defines our weighting models. Section 3 presents our origin analysis system. Section 4 introduces the relevant AI techniques and experiment methodology. Section 5 and 6 describes the case study and its result. The prediction platform based on our system is briefly talked about in Section 7. We conclude and talk about future work in Section 8. 2 Weighting Functions The design of weighting functions is explained in this section. The two categories are frequency based and hierarchy based. 2. Frequency Based Weighting Functions Intuitively, if a function has a lot of callees, then being called by this function carries a low weight. On the other hand, if this function only makes one call, then this one call carries a high weight. To implement this, the algorithm adjusts the weight of a function as a caller or callee according to the number of its callee set and caller set correspondingly. A number of monotony decreasing functions with different speeds is selected to represent the frequency based weighting category. As showed in Figure 2, the weight of a function called 0 times carries roughly 0% to 80% of the weight of a function called time, depending on the weighting function used. 2.2 Hierarchy Based Weighting Functions The other category of weighting function is the Hierarchy Based. The idea is to give weight according to the distance between the caller and callee. For example, it is desirable that library calls carry less weight than function calls within the same file. The question is how little the weight of a library call should be assigned in comparison to a same file function call. Instead of hand picking the values, we give each type a code value to be used as input and then use math functions to calculate their weights. The code value of Same File is, Same Directory 2, Same System 3 and Library Call 4. As shown in Figure 3, using the inverse function, library calls carries only 25% weight of the function calls made within same file.
3 Figure 2: The function plot of frequency based weighting functions Figure 3: The function plot of hierarchy based weighting functions 3 Function Based Origin Analyzing System There are three major components in the system if categorized by functionality: Fact Extractor, Data Analyzer and Attribute Learner. Figure 4 shows the data flow, illustrating how source code turns into final result that shows which weighting function or attribute is better. The entire process other than the human validation part is done automatically. 3. Fact Extractor Before any analysis can take place, much fact extraction must be done to prepare the data. This part is mostly done by using SWAG Kit and Beagle, which are two tools developed by the Software Architecture Group at the University of Waterloo. The goal here is to get the abstract information about the subject system and functions whose name and location were not changed. At first, source code files from two versions of the subject system are parsed using the SWAG Kit extractor. There are 3 steps in the extractor pipeline: cppx: Extract the facts. Produces *.ta from the original source files. prep: Prepare the facts. Produces *.o.ta from the extracted facts. linkplus: Link the facts. Produces out.ln.ta from *.o.ta. Once the basic entity extraction is done, the next step is preparing evolution facts from SWAG Kit output using evprep command of Beagle. The output file out.ev.ta contains facts about call relations, system structure and source info. The next step is loading facts into Beagle database.
4 Figure 4: Data Flow and System Overview. Rectangle represents processor while diamond represents data type. Blue means it is internal system while green means it is external system. Orange is data in human readable format while red is data objects. Purple rectangle is the various implementation of weighting functions, which extends the OAFuction. 3.2 Data Analyzer The task of Data Analyzer is to calculate attributes of each function pair using the abstract facts prepared by the Fact Extractor and algorithm specified by the user. First, the data generated by SWAG Kit and Beagle is fed to the Input Reader module to build the data structures. OASystem contains the information of the version of the system it represents. It knows things like how many functions it contains, what file and subsystem each function belongs to, etc. OAMapping is the data structure that links already matched functions between the two versions. It is implemented as a hashtable so that each query takes O() computation time. OAFunction is an abstract class that forms the basic foundation of the function entity it represents. The actual vital method that calculates the weight of that function is done by each individual class that extends it. After the abstract systems are built, the algorithm and weighting function specified by the user is run to calculate the desired attributes. This process essentially takes each function pair, looks at their information and fills in the attributes. Sample attributes include Overall Similarity, Caller Set Similarity and Callee Set Similarity.
5 3.3 Similarity Calculation In order for the similarity value to carry the same weight regardless of the size of the subject system, a relative similarity value rather than an absolute one is desirable. Through using a similarity value between 0 and, case dependent threshold tuning is avoided. Overall Similarity between two functions is calculated as: MatchingWeightCaller( Caller( f ), Caller( f 2), f, f 2) + MatchingWeightCallee( Callee( f ), Callee( f 2), f, f 2) OverallSim( f, f 2) = TotalWeight( Caller( f ), Caller( f 2), f, f 2) + TotalWeight( Callee( f ), Callee( f 2) Caller Set Similarity is: MatchingWeightCaller( Caller( f ), Caller( f 2), f, f 2) CallerSim ( f, f 2) = TotalWeight( Caller( f ), Caller( f 2), f, f 2) Callee Set Similarity is: MatchingWeightCaller( Caller( f ), Caller( f 2), f, f 2) CallerSim ( f, f 2) = TotalWeight( Caller( f ), Caller( f 2), f, f 2) where MatchingWeightCaller( set, set2, f, f 2) = MatchingWeightCallee( set, set2, f, f 2) = i MS i MS WeightAsCallee( i, f ) + WeightAsCaller( i, f ) System Output The output of the function analysis system is a text file containing the user requested attributes. In Figure 5, the first three columns are the similarity values calculate by using Log based weighting function. j MS 2 j MS 2 WeightAsCallee( j, f 2) WeightAsCaller( j, f 2) 3.5 Attribute Learner After the attributes table is filled, the next step is to determine which one of these attribute is a better indicator of a function match. The mechanism of how this is done is explained in detail in the next section, Experiment Methodology.
6 Figure 5: Sample output of training data. The first three columns are from Function Relation Analysis using Log based weighting function. The last column is from human validation. 4 Experimental Methodology In this section, the experiment methodology is explained in details. We decided against using precision and recall of a hand picked threshold to measure the performance for the following reasons: It's not automatic If the samples are changed in case of error in human validation or more samples are added in the experiment, the whole process of trying various threshold and calculating precision and recall have to be redone manually. It's not objective We could try many thresholds for our weighted function analysis and pick the best of them, meanwhile choose a threshold for the uniform function analysis that is suboptimal. While the performance gain in that case could be substantial, it's not reflecting the truth and rendering the result much less creditable. It's impossible to calculate the real recall value Log_Total Log_Caller Log_Callee Result FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE While the precision can be calculated precisely by going through identified functions one by one, there's practically no way to calculate the real recall value. That would require us to find all matching pairs in the system and there could easily be more than hundreds of thousands of pairs. Some paper [] has used techniques to get a pseudo recall value. But we feel that there is no guarantee on how close the pseudo recall number will be to the actual recall number, thus making this result unreliable. Based on these arguments, we decided to use a totally objective and automatic machine learning approach that is able to give us quantitative measure the performance of various weighting functions. The technique is called Information Theory which is the basis of Decision Tree Learning in Machine Learning context. Decision Tree Learning is one of the simplest, and yet most successful forms of learning algorithm. A decision tree takes as input an object or situation described by a set
7 of attributes and returns the predicted output value for the input [6]. The correctness of a decision tree depends on the choice of the attribute tests. The goal of our verification process is to determine if one attribute is a better indicator of function matching than the others. This is essentially to find a formal measure of the usefulness of attributes, which is the same goal in Decision Tree Learning. The measure should have its maximum value when the attribute is perfect, which means that the attribute can divide all the matching pairs from non-matching pairs perfectly and have its minimum value when the attribute is of no use at all. One suitable measure is the expected amount of information provided by the attribute, where the term is used in the mathematical sense first defined in Shannon and Weaver [6]. The information content is defined as In practice, given a training set contains p positive examples and n negative examples, the estimate of the information contained is The information gain from an attribute test is the difference between the original information requirement and the new requirement: where The attribute with the highest information gain is the one that is the best classifier and in our case, the one that is the best indicator of function matching. 5 Case Study We have performed a case study on Ctags using our weighted function relation analysis system. We chose Ctags because it is a well known piece of software that is in wide use and is reasonably sophisticated. Plus, it is written in C programming language, so we could use SWAG KIT to extract, abstract and explore the software architecture. For our case study, we selected 2 releases of Exuberant Ctags. Those are Ctags 4.0. released in June 2000 and Ctags released in May For the validation purpose, we need to know which functions are indeed matching functions. However, we cannot go through each function pair one by one, as there are 26 unmatched functions in version 4 and 578 unmatched functions in version 5, which makes combinations. Therefore, this process was done by first using Beagle to provide match candidate then manually go through the source code of those functions and decided if those are really matched or not. We used the ame matcher and Call relation matcher of Beagle in this case.
8 The name matcher calculates the longest common substring (LCS) of the name of the target entity against the names of each of the members of the candidate set, and normalizes the value against the average length of two entity names. We set the threshold to 0. so that we can get a larger set of candidates and not miss real matches. Out of the 39 match candidates suggested by Beagle, 5 were identified as true matches. The call relation matcher returns a normalized value indicating how closely the caller/callee sets of two entities match using uniform weight function relation analysis. Out of the 7 match candidates suggested by Beagle using threshold 0., 43 were identified as true matches. Once the matches are identified, this information along with the attributes calculated by the function analysis system is feed into the machine learning program. The result is shown in the next section. 6 Result Using the experiment methodology, the Information Gain associated with each weighting function is listed in Figure 6 and Figure 7. Total Similarity Caller Similarity Callee Similarity ( ) 8 ( ) 3 ( ) + ln( ) 20 Max IG Figure 6: Frequency based weighting function versus uniform weighting function.
9 Total Similarity Caller Similarity Callee Similarity Max IG Figure 7: Hierarchy based weighting function versus uniform weighting function The results shows that by using proper weighting function, such as in frequency based and in + ln( ) 0.95 hierarchy based, better performance can be achieved. The similarity values calculated by those weighting functions provide larger information gain. We notice that the increased information gain of using weighting functions over the uniform one is only around 5% in this case. After investigation, we are convinced that this is due to the fact that functions in Ctags are homogeneous. In Ctags, the number of incoming and outgoing calls of a function does not vary greatly. This means that frequency based weighting algorithms have little room to improve on. Also, all the functions in Ctags are in the same directory. This limits the difference any hierarchy based weighting function can make, and in turn restricts the potential improvement that can be achieved. If the functions in the subject system are heterogeneous in terms of call frequency and hierarchical call distance, then the performance of using weighting function can be increased significantly. 7 Function Match Identification System The ultimate goal in Origin Analysis is to be able to tell whether two functions or entities are similar enough to be considered the same. The traditional approaches use an attribute and a corresponding threshold as test. There are much research in finding good attributes and thresholds that work well in practice. Common approaches in this domain include name matching, parameter name matching, parameter type matching, function call relation matching, entity source code string matching, UML relationship matching [] and many other methods. The researches in this domain have typically focused on proving that a particular method and associated attribute is a better measurement than existing ones and the method can provide sufficient evidence on its own. Here, we propose an approach that can take advantage of all existing methods and is mathematically proven to be no worse than any of them. As we discussed in the section Experiment Methodology, the information theory can be used to find the best attribute out of a set of attributes. An approach, that keeps using the best attribute available for testing until no more attribute are available or required, can fully utilize the information contained in those attributes. The final result is mathematically more precise than using any attribute used alone if none of these attribute is overly dominant, which is the case in Origin Analysis. Decision Tree Learning algorithm is sufficient for this purpose. In Figure 8, we demonstrate the procedure of using Decision Tree Learning in function match identification.
10 We implemented the system that is capable of doing decision tree learning and predicting result. However, we consciously chose not to test the precision of the decision tree generated from using only function call relation information. As the small number in the information gain table indicates, the function call relation matcher cannot precisely predict the matching between functions simply because there is not enough information. On the other hand, we feel that if other attributes are also used in decision tree learning, the combined information is sufficient in making precise prediction. Figure 8: The procedures involved in using Origin Analysis System to identify entity match and in evaluating the prediction precision. 8 Conclusion and Future Work In this project we have developed two sets of weighting functions, designed and implemented an Origin Analysis system that uses them. Our hypothesis was that a well-designed weighted model in function relation analysis will outperform an unweighted one as the former makes use of more information. We performed a case study on the two versions of Ctags system and showed that our results are better than the un-weighted one. Our system is capable of giving quantitative comparison between various weighting functions. It also works on any variation of weighting functions or extra attributes. Another important advantage of our system is that it is completely automated and does not need any parameter picking or human input.
11 There are three directions in which future works can be carried out. One is to carry out more case studies. Another is to apply the experiment methodology to evaluate performance of various existing or new approaches in Origin Analysis. The third is to incorporate existing or new approaches in Origin Analysis into the prediction framework and produce a system that can reliable predict entity matching by using all the information available. More Case Studies By increasing the sample size, the accuracy of the result will be improved. By using Chernoff bound in ERM (Empirical Risk Minimization), We see that the sample size needed to achieve < 0.05 error with > 95% confidence is Right now, the training sample we have is of size The second reason to do more case studies is to increase the performance gain from using weighted function relation analysis. As discussed in the result section, Ctags offers little space for improvement due to its lack of subsystems and the fact that most of its functions share similar call frequency. A larger and more sophisticated system will give the weighted system more proper credits. Apply Experiment Methodology Using the methodology discussed in Section 4, one direction future work can take is to explore other related parts and incorporate newer features to compare their performances. Prospective possible works in this area would be evaluating the success of any algorithm or feature used in Origin Analysis. Examples include finding objective quantitative measure of the value of UML relationships, proving whether LCS is a better indicator than the number of character pairs in name matching, and whether parameter name matching is better than parameter type matching. Prediction System The third direction is to incorporate existing features used in Origin Analysis into the framework as discussed in Section 7. Mathematically speaking, the result is bound to be better, but it is interesting to see how good the prediction can be in practice and to measure its performance. References [] Z. Xing and E. Stroulia. UMLDiff: An Algorithm for Object-Oriented Design Differencing. In Proceedings of 20th IEEE International Conference on Automated Software Engineering (ASE 05), pages 54 65, [2] S. Kim, K. Pan, and E. J. Whitehead Jr. When functions change their names: Automatic detection of origin relationships. In Proceedings of the 2th Working Conference on Reverse Engineering (WCRE 2005), pages 43 52, Pittsburgh, Pennsylvania, USA, IEEE Computer Society. [3] Q. Tu and M.W. Godfrey, An integrated approach for studying architectural evolution, Proceedings of the 0th International Workshop on Program Comprehension, pp , [4] M. W. Godfrey and L. Zou. Using origin analysis to detect merging and splitting of source code entities. IEEE Transactions on Software Engineering, 3(2):66 8, 2005.
12 [5] D. L. Parnas, Software Aging, Proceedings of 6th Intl. Conference on Software Engineering, Sorrento, Italy, pp , May 994. [6] Stuart Russell and Peter orvig, Artificial Intelligence: A Modern Approach p653-p659
Thwarting Traceback Attack on Freenet
Thwarting Traceback Attack on Freenet Guanyu Tian, Zhenhai Duan Florida State University {tian, duan}@cs.fsu.edu Todd Baumeister, Yingfei Dong University of Hawaii {baumeist, yingfei}@hawaii.edu Abstract
More informationAURA: A Hybrid Approach to Identify
: A Hybrid to Identify Wei Wu 1, Yann-Gaël Guéhéneuc 1, Giuliano Antoniol 2, and Miryung Kim 3 1 Ptidej Team, DGIGL, École Polytechnique de Montréal, Canada 2 SOCCER Lab, DGIGL, École Polytechnique de
More informationEvolutionary Algorithms
Evolutionary Algorithms Proposal for a programming project for INF431, Spring 2014 version 14-02-19+23:09 Benjamin Doerr, LIX, Ecole Polytechnique Difficulty * *** 1 Synopsis This project deals with the
More informationSections Graphical Displays and Measures of Center. Brian Habing Department of Statistics University of South Carolina.
STAT 515 Statistical Methods I Sections 2.1-2.3 Graphical Displays and Measures of Center Brian Habing Department of Statistics University of South Carolina Redistribution of these slides without permission
More informationRefactoring Practice: How it is and How it Should be Supported
Refactoring Practice: How it is and How it Should be Supported Zhenchang Xing and EleniStroulia Presented by: Sultan Almaghthawi 1 Outline Main Idea Related Works/Literature Alignment Overview of the Case
More informationGrade 6 Math Circles November 6 & Relations, Functions, and Morphisms
Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and
More informationPredictive Analysis: Evaluation and Experimentation. Heejun Kim
Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training
More informationHEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY
Proceedings of the 1998 Winter Simulation Conference D.J. Medeiros, E.F. Watson, J.S. Carson and M.S. Manivannan, eds. HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A
More informationEmpirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the
More informationUse of Synthetic Data in Testing Administrative Records Systems
Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive
More informationPARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS. Mike Gerdes 1, Dieter Scholz 1
AST 2011 Workshop on Aviation System Technology PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS Mike Gerdes 1, Dieter Scholz 1 1 Aero - Aircraft Design
More informationStudy of Procedure Signature Evolution Software Engineering Project Preetha Ramachandran
Study of Procedure Signature Evolution Software Engineering Project Preetha Ramachandran preetha@soe.ucsc.edu 1.0 Introduction Software evolution is a continuous process. New features are frequently added,
More informationSeparating Speech From Noise Challenge
Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins
More informationExtraction of Evolution Tree from Product Variants Using Linear Counting Algorithm. Liu Shuchang
Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm Liu Shuchang 30 2 7 29 Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm Liu Shuchang
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationExact Algorithms Lecture 7: FPT Hardness and the ETH
Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,
More informationcode pattern analysis of object-oriented programming languages
code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationFormalizing Fact Extraction
atem 2003 Preliminary Version Formalizing Fact Extraction Yuan Lin 1 School of Computer Science University of Waterloo 200 University Avenue West Waterloo, ON N2L 3G1, Canada Richard C. Holt 2 School of
More informationABBYY Smart Classifier 2.7 User Guide
ABBYY Smart Classifier 2.7 User Guide Table of Contents Introducing ABBYY Smart Classifier... 4 ABBYY Smart Classifier architecture... 6 About Document Classification... 8 The life cycle of a classification
More informationOverview of the INEX 2009 Link the Wiki Track
Overview of the INEX 2009 Link the Wiki Track Wei Che (Darren) Huang 1, Shlomo Geva 2 and Andrew Trotman 3 Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia 1,
More informationAn Object Oriented Runtime Complexity Metric based on Iterative Decision Points
An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science
More informationIn our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology.
Guide to and Hi everybody! In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology. This guide focuses on two of those symbols: and. These symbols represent concepts
More informationForensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique
Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique R D Neal, R J Shaw and A S Atkins Faculty of Computing, Engineering and Technology, Staffordshire University, Stafford
More informationRanking Clustered Data with Pairwise Comparisons
Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationAbstract. We define an origin relationship as follows, based on [12].
When Functions Change Their Names: Automatic Detection of Origin Relationships Sunghun Kim, Kai Pan, E. James Whitehead, Jr. Dept. of Computer Science University of California, Santa Cruz Santa Cruz, CA
More informationPedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016
edestrian Detection Using Correlated Lidar and Image Data EECS442 Final roject Fall 2016 Samuel Rohrer University of Michigan rohrer@umich.edu Ian Lin University of Michigan tiannis@umich.edu Abstract
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationA Session-based Ontology Alignment Approach for Aligning Large Ontologies
Undefined 1 (2009) 1 5 1 IOS Press A Session-based Ontology Alignment Approach for Aligning Large Ontologies Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University,
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationAutomatic Identification of Important Clones for Refactoring and Tracking
Automatic Identification of Important Clones for Refactoring and Tracking Manishankar Mondal Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {mshankar.mondal,
More informationCS299 Detailed Plan. Shawn Tice. February 5, The high-level steps for classifying web pages in Yioop are as follows:
CS299 Detailed Plan Shawn Tice February 5, 2013 Overview The high-level steps for classifying web pages in Yioop are as follows: 1. Create a new classifier for a unique label. 2. Train it on a labelled
More informationOutline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include
Outline Computer Science 331 Correctness of Algorithms Mike Jacobson Department of Computer Science University of Calgary Lectures #2-4 1 What is a? Applications 2 Recursive Algorithms 3 Final Notes Additional
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationSome Applications of Graph Bandwidth to Constraint Satisfaction Problems
Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept
More informationDistributed minimum spanning tree problem
Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with
More informationCombining Selective Search Segmentation and Random Forest for Image Classification
Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationEECS 349 Machine Learning Homework 3
WHAT TO HAND IN You are to submit the following things for this homework: 1. A SINGLE PDF document containing answers to the homework questions. 2. The WELL COMMENTED MATLAB source code for all software
More informationChapter 3. Set Theory. 3.1 What is a Set?
Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any
More informationRank Measures for Ordering
Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many
More informationFrom Whence It Came: Detecting Source Code Clones by Analyzing Assembler
From Whence It Came: Detecting Source Code Clones by Analyzing Assembler Ian J. Davis and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada
More informationCost-sensitive Boosting for Concept Drift
Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems
More informationReducing Directed Max Flow to Undirected Max Flow and Bipartite Matching
Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching Henry Lin Division of Computer Science University of California, Berkeley Berkeley, CA 94720 Email: henrylin@eecs.berkeley.edu Abstract
More informationData Science with R Decision Trees with Rattle
Data Science with R Decision Trees with Rattle Graham.Williams@togaware.com 9th June 2014 Visit http://onepager.togaware.com/ for more OnePageR s. In this module we use the weather dataset to explore the
More informationMotion Detection Algorithm
Volume 1, No. 12, February 2013 ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Motion Detection
More informationSection 4 General Factorial Tutorials
Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One
More informationHardware versus software
Logic 1 Hardware versus software 2 In hardware such as chip design or architecture, designs are usually proven to be correct using proof tools In software, a program is very rarely proved correct Why?
More informationKnowledge Engineering in Search Engines
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Knowledge Engineering in Search Engines Yun-Chieh Lin Follow this and additional works at:
More informationEXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS
EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS Moon-Soo Lee, Yeon-June Choi, Min-Jeong Kim, Oh-Chun, Kwon Telematics S/W Platform Team, Telematics Research Division Electronics and Telecommunications
More informationBaggTaming Learning from Wild and Tame Data
BaggTaming Learning from Wild and Tame Data Wikis, Blogs, Bookmarking Tools - Mining the Web 2.0 Workshop @ECML/PKDD2008 Workshop, 15/9/2008 Toshihiro Kamishima, Masahiro Hamasaki, and Shotaro Akaho National
More informationExploring Similarity Measures for Biometric Databases
Exploring Similarity Measures for Biometric Databases Praveer Mansukhani, Venu Govindaraju Center for Unified Biometrics and Sensors (CUBS) University at Buffalo {pdm5, govind}@buffalo.edu Abstract. Currently
More informationAttributes as Operators (Supplementary Material)
In Proceedings of the European Conference on Computer Vision (ECCV), 2018 Attributes as Operators (Supplementary Material) This document consists of supplementary material to support the main paper text.
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationCREATIVE ASSERTION AND CONSTRAINT METHODS FOR FORMAL DESIGN VERIFICATION
CREATIVE ASSERTION AND CONSTRAINT METHODS FOR FORMAL DESIGN VERIFICATION Joseph Richards SGI, High Performance Systems Development Mountain View, CA richards@sgi.com Abstract The challenges involved in
More informationTechnical Brief: Domain Risk Score Proactively uncover threats using DNS and data science
Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily
More informationA modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems
A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University
More informationChapter Fourteen Bonus Lessons: Algorithms and Efficiency
: Algorithms and Efficiency The following lessons take a deeper look at Chapter 14 topics regarding algorithms, efficiency, and Big O measurements. They can be completed by AP students after Chapter 14.
More informationQ: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month
Lecture 1 Q: Which month has the lowest sale? Q:There are three consecutive months for which sale grow. What are they? Q: Which month experienced the biggest drop in sale? Q: Just above November there
More informationHi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides
Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides for both problems first, and let you guys code them
More informationSupervised classification of law area in the legal domain
AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms
More informationSamuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR
Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report
More informationSolved Question Paper June 2017
Solved Question Paper June 2017 1.a) What are the benefits of Object Oriented Methodology in real life applications? Briefly explain each element of the state diagram with respect to dynamic modeling.
More informationCHAPTER 6 QUANTITATIVE PERFORMANCE ANALYSIS OF THE PROPOSED COLOR TEXTURE SEGMENTATION ALGORITHMS
145 CHAPTER 6 QUANTITATIVE PERFORMANCE ANALYSIS OF THE PROPOSED COLOR TEXTURE SEGMENTATION ALGORITHMS 6.1 INTRODUCTION This chapter analyzes the performance of the three proposed colortexture segmentation
More informationCS103 Spring 2018 Mathematical Vocabulary
CS103 Spring 2018 Mathematical Vocabulary You keep using that word. I do not think it means what you think it means. - Inigo Montoya, from The Princess Bride Consider the humble while loop in most programming
More informationA Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines
A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines Jakob Axelsson School of Innovation, Design and Engineering, Mälardalen University, SE-721 23 Västerås, Sweden
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationCHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES
188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two
More informationExplicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition
2009 0th International Conference on Document Analysis and Recognition Explicit fuzzy modeling of and positioning for handwritten Chinese character recognition Adrien Delaye - Eric Anquetil - Sébastien
More informationScreening Design Selection
Screening Design Selection Summary... 1 Data Input... 2 Analysis Summary... 5 Power Curve... 7 Calculations... 7 Summary The STATGRAPHICS experimental design section can create a wide variety of designs
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:
IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T
More informationCOPULA MODELS FOR BIG DATA USING DATA SHUFFLING
COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019
More informationP a g e 1. MathCAD VS MATLAB. A Usability Comparison. By Brian Tucker
P a g e 1 MathCAD VS MATLAB A Usability Comparison By Brian Tucker P a g e 2 Table of Contents Introduction... 3 Methodology... 3 Tasks... 3 Test Environment... 3 Evaluative Criteria/Rating Scale... 4
More informationTest designs for evaluating the effectiveness of mail packs Received: 30th November, 2001
Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Leonard Paas previously worked as a senior consultant at the Database Marketing Centre of Postbank. He worked on
More informationAn Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification
An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University
More information6.001 Notes: Section 4.1
6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationRecurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications
Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications D.A. Karras 1 and V. Zorkadis 2 1 University of Piraeus, Dept. of Business Administration,
More informationRainforest maths. Australian Mathematics Curriculum Achievement Standards Correlations Foundation year
Australian Mathematics Curriculum Achievement Standards Correlations Foundation year NUMBER and ALGEBRA ACMNA Establish understanding of the language and processes of counting by naming numbers in sequences,
More informationRandom Oracles - OAEP
Random Oracles - OAEP Anatoliy Gliberman, Dmitry Zontov, Patrick Nordahl September 23, 2004 Reading Overview There are two papers presented this week. The first paper, Random Oracles are Practical: A Paradigm
More informationSome questions of consensus building using co-association
Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper
More informationA Feature Selection Method to Handle Imbalanced Data in Text Classification
A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University
More informationClustering Web Documents using Hierarchical Method for Efficient Cluster Formation
Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation I.Ceema *1, M.Kavitha *2, G.Renukadevi *3, G.sripriya *4, S. RajeshKumar #5 * Assistant Professor, Bon Secourse College
More informationAn Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications
An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications Zhenhai Duan, Kartik Gopalan, Xin Yuan Abstract In this paper we present a detailed study of the behavioral characteristics
More informationCS 224W Final Report Group 37
1 Introduction CS 224W Final Report Group 37 Aaron B. Adcock Milinda Lakkam Justin Meyer Much of the current research is being done on social networks, where the cost of an edge is almost nothing; the
More informationLS-OPT : New Developments and Outlook
13 th International LS-DYNA Users Conference Session: Optimization LS-OPT : New Developments and Outlook Nielen Stander and Anirban Basudhar Livermore Software Technology Corporation Livermore, CA 94588
More informationCASE BASED REASONING A SHORT OVERVIEW
CASE BASED REASONING A SHORT OVERVIEW Z. Budimac, V. Kurbalija Institute of Mathematics and Computer Science, Fac. of Science, Univ. of Novi Sad Trg D. Obradovića 4, 21000 Novi Sad, Yugoslavia zjb@im.ns.ac.yu,
More informationE-Companion: On Styles in Product Design: An Analysis of US. Design Patents
E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing
More informationCS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist
CS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist In Handout 28, the Guide to Inductive Proofs, we outlined a number of specifc issues and concepts to be mindful about when
More informationMath Search with Equivalence Detection Using Parse-tree Normalization
Math Search with Equivalence Detection Using Parse-tree Normalization Abdou Youssef Department of Computer Science The George Washington University Washington, DC 20052 Phone: +1(202)994.6569 ayoussef@gwu.edu
More informationPerformance Evaluation of XHTML encoding and compression
Performance Evaluation of XHTML encoding and compression Sathiamoorthy Manoharan Department of Computer Science, University of Auckland, Auckland, New Zealand Abstract. The wireless markup language (WML),
More informationAnalyzing Dshield Logs Using Fully Automatic Cross-Associations
Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationThe Cheapest Way to Obtain Solution by Graph-Search Algorithms
Acta Polytechnica Hungarica Vol. 14, No. 6, 2017 The Cheapest Way to Obtain Solution by Graph-Search Algorithms Benedek Nagy Eastern Mediterranean University, Faculty of Arts and Sciences, Department Mathematics,
More informationTracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003
Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003 Graham Cormode graham@dimacs.rutgers.edu dimacs.rutgers.edu/~graham S. Muthukrishnan muthu@cs.rutgers.edu Everyday
More informationSignature Verification Why xyzmo offers the leading solution
Dynamic (Biometric) Signature Verification The signature is the last remnant of the hand-written document in a digital world, and is considered an acceptable and trustworthy means of authenticating all
More informationBUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks
1 BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks Pan Hui, Jon Crowcroft, Eiko Yoneki Presented By: Shaymaa Khater 2 Outline Introduction. Goals. Data Sets. Community Detection Algorithms
More informationThe Intelligent Process Planner and Scheduler. by Carl P. Thompson Advisor: Jeffrey W. Herrmann, Edward Lin, Mark Fleischer, Vidit Mathur
UNDERGRADUATE REPORT The Intelligent Process Planner and Scheduler by Carl P. Thompson Advisor: Jeffrey W. Herrmann, Edward Lin, Mark Fleischer, Vidit Mathur U.G. 2000-1 I R INSTITUTE FOR SYSTEMS RESEARCH
More information