Improving Origin Analysis with Weighting Functions

Size: px
Start display at page:

Download "Improving Origin Analysis with Weighting Functions"

Transcription

1 Improving Origin Analysis with Weighting Functions Lin Yang, Anwar Haque and Xin Zhan Supervisor: Michael Godfrey University of Waterloo Introduction Software systems must undergo modifications to improve its readability, simplifies its structure or in response to changes in user requests and software environment [5]. These activities involve renaming, moving, splitting and merging source code entities. Consequently, many entities that appear new in the later release are actually transformed from old entities. There exists several previous works [2, 3, 4] on this subject, which is termed Origin Analysis. Godfrey et al. [3, 4] proposed algorithms to find entity rename, split and merge over different releases by analyzing the similarity of call relations as well as various attributes of the program entities. In their approach, each caller function or callee function are treated with equal importance. However, some functions carry more weight than others as illustrated in the next paragraph. Consider three functions A, B and C with 4, 2 and 2 callees respectively as shown in Figure. Traditionally call relation matcher treats each callee equally, thus the similarity of A and B will be calculated as the same as that of A and C. However, with more information of the callee functions available, a better decision is possible. For example, if F and F2 are standard library functions while F3 and F4 are functions that are defined in the same file as the caller, then A and C are more likely to be matching functions than B and C. Another case is if F and F2 are called 00 times in the system while F3 and F4 are called only 5 times, then A and C are more likely to be matching functions than B and C. In order to capture such differences, each caller and callee function should be assigned a suitable weight, rather than treated equally. A B C F F2 F3 F4 F F2 F3 F4 Figure : An example of call relations. In this paper, we have designed two weighting schemes: hierarchy based and frequency based. We have also proposed an automatic approach for doing origin analysis based on Machine Learning techniques. Unlike traditional work, our approach does not require human input in picking the weights and thresholds. The case study we conducted on Ctags demonstrates the effectiveness of our approach.

2 Our contributions are as follows: We have developed two weighting schemes to measure call relation similarity: frequency based and hierarchy based, Design and implementation of a system that is flexible enough to be used as a platform of doing function based Origin Analysis. We have carried a case study on Ctags. Experiment result shows our approach achieves higher accuracy than unweighted call relationship analysis. We provide a mathematically proven methodology of comparing the usefulness of attributes in Origin Analysis. Establish an automatic function match identification framework based on decision tree learning. The remainder of this report is organized as follows: Section 2 defines our weighting models. Section 3 presents our origin analysis system. Section 4 introduces the relevant AI techniques and experiment methodology. Section 5 and 6 describes the case study and its result. The prediction platform based on our system is briefly talked about in Section 7. We conclude and talk about future work in Section 8. 2 Weighting Functions The design of weighting functions is explained in this section. The two categories are frequency based and hierarchy based. 2. Frequency Based Weighting Functions Intuitively, if a function has a lot of callees, then being called by this function carries a low weight. On the other hand, if this function only makes one call, then this one call carries a high weight. To implement this, the algorithm adjusts the weight of a function as a caller or callee according to the number of its callee set and caller set correspondingly. A number of monotony decreasing functions with different speeds is selected to represent the frequency based weighting category. As showed in Figure 2, the weight of a function called 0 times carries roughly 0% to 80% of the weight of a function called time, depending on the weighting function used. 2.2 Hierarchy Based Weighting Functions The other category of weighting function is the Hierarchy Based. The idea is to give weight according to the distance between the caller and callee. For example, it is desirable that library calls carry less weight than function calls within the same file. The question is how little the weight of a library call should be assigned in comparison to a same file function call. Instead of hand picking the values, we give each type a code value to be used as input and then use math functions to calculate their weights. The code value of Same File is, Same Directory 2, Same System 3 and Library Call 4. As shown in Figure 3, using the inverse function, library calls carries only 25% weight of the function calls made within same file.

3 Figure 2: The function plot of frequency based weighting functions Figure 3: The function plot of hierarchy based weighting functions 3 Function Based Origin Analyzing System There are three major components in the system if categorized by functionality: Fact Extractor, Data Analyzer and Attribute Learner. Figure 4 shows the data flow, illustrating how source code turns into final result that shows which weighting function or attribute is better. The entire process other than the human validation part is done automatically. 3. Fact Extractor Before any analysis can take place, much fact extraction must be done to prepare the data. This part is mostly done by using SWAG Kit and Beagle, which are two tools developed by the Software Architecture Group at the University of Waterloo. The goal here is to get the abstract information about the subject system and functions whose name and location were not changed. At first, source code files from two versions of the subject system are parsed using the SWAG Kit extractor. There are 3 steps in the extractor pipeline: cppx: Extract the facts. Produces *.ta from the original source files. prep: Prepare the facts. Produces *.o.ta from the extracted facts. linkplus: Link the facts. Produces out.ln.ta from *.o.ta. Once the basic entity extraction is done, the next step is preparing evolution facts from SWAG Kit output using evprep command of Beagle. The output file out.ev.ta contains facts about call relations, system structure and source info. The next step is loading facts into Beagle database.

4 Figure 4: Data Flow and System Overview. Rectangle represents processor while diamond represents data type. Blue means it is internal system while green means it is external system. Orange is data in human readable format while red is data objects. Purple rectangle is the various implementation of weighting functions, which extends the OAFuction. 3.2 Data Analyzer The task of Data Analyzer is to calculate attributes of each function pair using the abstract facts prepared by the Fact Extractor and algorithm specified by the user. First, the data generated by SWAG Kit and Beagle is fed to the Input Reader module to build the data structures. OASystem contains the information of the version of the system it represents. It knows things like how many functions it contains, what file and subsystem each function belongs to, etc. OAMapping is the data structure that links already matched functions between the two versions. It is implemented as a hashtable so that each query takes O() computation time. OAFunction is an abstract class that forms the basic foundation of the function entity it represents. The actual vital method that calculates the weight of that function is done by each individual class that extends it. After the abstract systems are built, the algorithm and weighting function specified by the user is run to calculate the desired attributes. This process essentially takes each function pair, looks at their information and fills in the attributes. Sample attributes include Overall Similarity, Caller Set Similarity and Callee Set Similarity.

5 3.3 Similarity Calculation In order for the similarity value to carry the same weight regardless of the size of the subject system, a relative similarity value rather than an absolute one is desirable. Through using a similarity value between 0 and, case dependent threshold tuning is avoided. Overall Similarity between two functions is calculated as: MatchingWeightCaller( Caller( f ), Caller( f 2), f, f 2) + MatchingWeightCallee( Callee( f ), Callee( f 2), f, f 2) OverallSim( f, f 2) = TotalWeight( Caller( f ), Caller( f 2), f, f 2) + TotalWeight( Callee( f ), Callee( f 2) Caller Set Similarity is: MatchingWeightCaller( Caller( f ), Caller( f 2), f, f 2) CallerSim ( f, f 2) = TotalWeight( Caller( f ), Caller( f 2), f, f 2) Callee Set Similarity is: MatchingWeightCaller( Caller( f ), Caller( f 2), f, f 2) CallerSim ( f, f 2) = TotalWeight( Caller( f ), Caller( f 2), f, f 2) where MatchingWeightCaller( set, set2, f, f 2) = MatchingWeightCallee( set, set2, f, f 2) = i MS i MS WeightAsCallee( i, f ) + WeightAsCaller( i, f ) System Output The output of the function analysis system is a text file containing the user requested attributes. In Figure 5, the first three columns are the similarity values calculate by using Log based weighting function. j MS 2 j MS 2 WeightAsCallee( j, f 2) WeightAsCaller( j, f 2) 3.5 Attribute Learner After the attributes table is filled, the next step is to determine which one of these attribute is a better indicator of a function match. The mechanism of how this is done is explained in detail in the next section, Experiment Methodology.

6 Figure 5: Sample output of training data. The first three columns are from Function Relation Analysis using Log based weighting function. The last column is from human validation. 4 Experimental Methodology In this section, the experiment methodology is explained in details. We decided against using precision and recall of a hand picked threshold to measure the performance for the following reasons: It's not automatic If the samples are changed in case of error in human validation or more samples are added in the experiment, the whole process of trying various threshold and calculating precision and recall have to be redone manually. It's not objective We could try many thresholds for our weighted function analysis and pick the best of them, meanwhile choose a threshold for the uniform function analysis that is suboptimal. While the performance gain in that case could be substantial, it's not reflecting the truth and rendering the result much less creditable. It's impossible to calculate the real recall value Log_Total Log_Caller Log_Callee Result FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE While the precision can be calculated precisely by going through identified functions one by one, there's practically no way to calculate the real recall value. That would require us to find all matching pairs in the system and there could easily be more than hundreds of thousands of pairs. Some paper [] has used techniques to get a pseudo recall value. But we feel that there is no guarantee on how close the pseudo recall number will be to the actual recall number, thus making this result unreliable. Based on these arguments, we decided to use a totally objective and automatic machine learning approach that is able to give us quantitative measure the performance of various weighting functions. The technique is called Information Theory which is the basis of Decision Tree Learning in Machine Learning context. Decision Tree Learning is one of the simplest, and yet most successful forms of learning algorithm. A decision tree takes as input an object or situation described by a set

7 of attributes and returns the predicted output value for the input [6]. The correctness of a decision tree depends on the choice of the attribute tests. The goal of our verification process is to determine if one attribute is a better indicator of function matching than the others. This is essentially to find a formal measure of the usefulness of attributes, which is the same goal in Decision Tree Learning. The measure should have its maximum value when the attribute is perfect, which means that the attribute can divide all the matching pairs from non-matching pairs perfectly and have its minimum value when the attribute is of no use at all. One suitable measure is the expected amount of information provided by the attribute, where the term is used in the mathematical sense first defined in Shannon and Weaver [6]. The information content is defined as In practice, given a training set contains p positive examples and n negative examples, the estimate of the information contained is The information gain from an attribute test is the difference between the original information requirement and the new requirement: where The attribute with the highest information gain is the one that is the best classifier and in our case, the one that is the best indicator of function matching. 5 Case Study We have performed a case study on Ctags using our weighted function relation analysis system. We chose Ctags because it is a well known piece of software that is in wide use and is reasonably sophisticated. Plus, it is written in C programming language, so we could use SWAG KIT to extract, abstract and explore the software architecture. For our case study, we selected 2 releases of Exuberant Ctags. Those are Ctags 4.0. released in June 2000 and Ctags released in May For the validation purpose, we need to know which functions are indeed matching functions. However, we cannot go through each function pair one by one, as there are 26 unmatched functions in version 4 and 578 unmatched functions in version 5, which makes combinations. Therefore, this process was done by first using Beagle to provide match candidate then manually go through the source code of those functions and decided if those are really matched or not. We used the ame matcher and Call relation matcher of Beagle in this case.

8 The name matcher calculates the longest common substring (LCS) of the name of the target entity against the names of each of the members of the candidate set, and normalizes the value against the average length of two entity names. We set the threshold to 0. so that we can get a larger set of candidates and not miss real matches. Out of the 39 match candidates suggested by Beagle, 5 were identified as true matches. The call relation matcher returns a normalized value indicating how closely the caller/callee sets of two entities match using uniform weight function relation analysis. Out of the 7 match candidates suggested by Beagle using threshold 0., 43 were identified as true matches. Once the matches are identified, this information along with the attributes calculated by the function analysis system is feed into the machine learning program. The result is shown in the next section. 6 Result Using the experiment methodology, the Information Gain associated with each weighting function is listed in Figure 6 and Figure 7. Total Similarity Caller Similarity Callee Similarity ( ) 8 ( ) 3 ( ) + ln( ) 20 Max IG Figure 6: Frequency based weighting function versus uniform weighting function.

9 Total Similarity Caller Similarity Callee Similarity Max IG Figure 7: Hierarchy based weighting function versus uniform weighting function The results shows that by using proper weighting function, such as in frequency based and in + ln( ) 0.95 hierarchy based, better performance can be achieved. The similarity values calculated by those weighting functions provide larger information gain. We notice that the increased information gain of using weighting functions over the uniform one is only around 5% in this case. After investigation, we are convinced that this is due to the fact that functions in Ctags are homogeneous. In Ctags, the number of incoming and outgoing calls of a function does not vary greatly. This means that frequency based weighting algorithms have little room to improve on. Also, all the functions in Ctags are in the same directory. This limits the difference any hierarchy based weighting function can make, and in turn restricts the potential improvement that can be achieved. If the functions in the subject system are heterogeneous in terms of call frequency and hierarchical call distance, then the performance of using weighting function can be increased significantly. 7 Function Match Identification System The ultimate goal in Origin Analysis is to be able to tell whether two functions or entities are similar enough to be considered the same. The traditional approaches use an attribute and a corresponding threshold as test. There are much research in finding good attributes and thresholds that work well in practice. Common approaches in this domain include name matching, parameter name matching, parameter type matching, function call relation matching, entity source code string matching, UML relationship matching [] and many other methods. The researches in this domain have typically focused on proving that a particular method and associated attribute is a better measurement than existing ones and the method can provide sufficient evidence on its own. Here, we propose an approach that can take advantage of all existing methods and is mathematically proven to be no worse than any of them. As we discussed in the section Experiment Methodology, the information theory can be used to find the best attribute out of a set of attributes. An approach, that keeps using the best attribute available for testing until no more attribute are available or required, can fully utilize the information contained in those attributes. The final result is mathematically more precise than using any attribute used alone if none of these attribute is overly dominant, which is the case in Origin Analysis. Decision Tree Learning algorithm is sufficient for this purpose. In Figure 8, we demonstrate the procedure of using Decision Tree Learning in function match identification.

10 We implemented the system that is capable of doing decision tree learning and predicting result. However, we consciously chose not to test the precision of the decision tree generated from using only function call relation information. As the small number in the information gain table indicates, the function call relation matcher cannot precisely predict the matching between functions simply because there is not enough information. On the other hand, we feel that if other attributes are also used in decision tree learning, the combined information is sufficient in making precise prediction. Figure 8: The procedures involved in using Origin Analysis System to identify entity match and in evaluating the prediction precision. 8 Conclusion and Future Work In this project we have developed two sets of weighting functions, designed and implemented an Origin Analysis system that uses them. Our hypothesis was that a well-designed weighted model in function relation analysis will outperform an unweighted one as the former makes use of more information. We performed a case study on the two versions of Ctags system and showed that our results are better than the un-weighted one. Our system is capable of giving quantitative comparison between various weighting functions. It also works on any variation of weighting functions or extra attributes. Another important advantage of our system is that it is completely automated and does not need any parameter picking or human input.

11 There are three directions in which future works can be carried out. One is to carry out more case studies. Another is to apply the experiment methodology to evaluate performance of various existing or new approaches in Origin Analysis. The third is to incorporate existing or new approaches in Origin Analysis into the prediction framework and produce a system that can reliable predict entity matching by using all the information available. More Case Studies By increasing the sample size, the accuracy of the result will be improved. By using Chernoff bound in ERM (Empirical Risk Minimization), We see that the sample size needed to achieve < 0.05 error with > 95% confidence is Right now, the training sample we have is of size The second reason to do more case studies is to increase the performance gain from using weighted function relation analysis. As discussed in the result section, Ctags offers little space for improvement due to its lack of subsystems and the fact that most of its functions share similar call frequency. A larger and more sophisticated system will give the weighted system more proper credits. Apply Experiment Methodology Using the methodology discussed in Section 4, one direction future work can take is to explore other related parts and incorporate newer features to compare their performances. Prospective possible works in this area would be evaluating the success of any algorithm or feature used in Origin Analysis. Examples include finding objective quantitative measure of the value of UML relationships, proving whether LCS is a better indicator than the number of character pairs in name matching, and whether parameter name matching is better than parameter type matching. Prediction System The third direction is to incorporate existing features used in Origin Analysis into the framework as discussed in Section 7. Mathematically speaking, the result is bound to be better, but it is interesting to see how good the prediction can be in practice and to measure its performance. References [] Z. Xing and E. Stroulia. UMLDiff: An Algorithm for Object-Oriented Design Differencing. In Proceedings of 20th IEEE International Conference on Automated Software Engineering (ASE 05), pages 54 65, [2] S. Kim, K. Pan, and E. J. Whitehead Jr. When functions change their names: Automatic detection of origin relationships. In Proceedings of the 2th Working Conference on Reverse Engineering (WCRE 2005), pages 43 52, Pittsburgh, Pennsylvania, USA, IEEE Computer Society. [3] Q. Tu and M.W. Godfrey, An integrated approach for studying architectural evolution, Proceedings of the 0th International Workshop on Program Comprehension, pp , [4] M. W. Godfrey and L. Zou. Using origin analysis to detect merging and splitting of source code entities. IEEE Transactions on Software Engineering, 3(2):66 8, 2005.

12 [5] D. L. Parnas, Software Aging, Proceedings of 6th Intl. Conference on Software Engineering, Sorrento, Italy, pp , May 994. [6] Stuart Russell and Peter orvig, Artificial Intelligence: A Modern Approach p653-p659

Thwarting Traceback Attack on Freenet

Thwarting Traceback Attack on Freenet Thwarting Traceback Attack on Freenet Guanyu Tian, Zhenhai Duan Florida State University {tian, duan}@cs.fsu.edu Todd Baumeister, Yingfei Dong University of Hawaii {baumeist, yingfei}@hawaii.edu Abstract

More information

AURA: A Hybrid Approach to Identify

AURA: A Hybrid Approach to Identify : A Hybrid to Identify Wei Wu 1, Yann-Gaël Guéhéneuc 1, Giuliano Antoniol 2, and Miryung Kim 3 1 Ptidej Team, DGIGL, École Polytechnique de Montréal, Canada 2 SOCCER Lab, DGIGL, École Polytechnique de

More information

Evolutionary Algorithms

Evolutionary Algorithms Evolutionary Algorithms Proposal for a programming project for INF431, Spring 2014 version 14-02-19+23:09 Benjamin Doerr, LIX, Ecole Polytechnique Difficulty * *** 1 Synopsis This project deals with the

More information

Sections Graphical Displays and Measures of Center. Brian Habing Department of Statistics University of South Carolina.

Sections Graphical Displays and Measures of Center. Brian Habing Department of Statistics University of South Carolina. STAT 515 Statistical Methods I Sections 2.1-2.3 Graphical Displays and Measures of Center Brian Habing Department of Statistics University of South Carolina Redistribution of these slides without permission

More information

Refactoring Practice: How it is and How it Should be Supported

Refactoring Practice: How it is and How it Should be Supported Refactoring Practice: How it is and How it Should be Supported Zhenchang Xing and EleniStroulia Presented by: Sultan Almaghthawi 1 Outline Main Idea Related Works/Literature Alignment Overview of the Case

More information

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY Proceedings of the 1998 Winter Simulation Conference D.J. Medeiros, E.F. Watson, J.S. Carson and M.S. Manivannan, eds. HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A

More information

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the

More information

Use of Synthetic Data in Testing Administrative Records Systems

Use of Synthetic Data in Testing Administrative Records Systems Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive

More information

PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS. Mike Gerdes 1, Dieter Scholz 1

PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS. Mike Gerdes 1, Dieter Scholz 1 AST 2011 Workshop on Aviation System Technology PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS Mike Gerdes 1, Dieter Scholz 1 1 Aero - Aircraft Design

More information

Study of Procedure Signature Evolution Software Engineering Project Preetha Ramachandran

Study of Procedure Signature Evolution Software Engineering Project Preetha Ramachandran Study of Procedure Signature Evolution Software Engineering Project Preetha Ramachandran preetha@soe.ucsc.edu 1.0 Introduction Software evolution is a continuous process. New features are frequently added,

More information

Separating Speech From Noise Challenge

Separating Speech From Noise Challenge Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins

More information

Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm. Liu Shuchang

Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm. Liu Shuchang Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm Liu Shuchang 30 2 7 29 Extraction of Evolution Tree from Product Variants Using Linear Counting Algorithm Liu Shuchang

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Exact Algorithms Lecture 7: FPT Hardness and the ETH

Exact Algorithms Lecture 7: FPT Hardness and the ETH Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Formalizing Fact Extraction

Formalizing Fact Extraction atem 2003 Preliminary Version Formalizing Fact Extraction Yuan Lin 1 School of Computer Science University of Waterloo 200 University Avenue West Waterloo, ON N2L 3G1, Canada Richard C. Holt 2 School of

More information

ABBYY Smart Classifier 2.7 User Guide

ABBYY Smart Classifier 2.7 User Guide ABBYY Smart Classifier 2.7 User Guide Table of Contents Introducing ABBYY Smart Classifier... 4 ABBYY Smart Classifier architecture... 6 About Document Classification... 8 The life cycle of a classification

More information

Overview of the INEX 2009 Link the Wiki Track

Overview of the INEX 2009 Link the Wiki Track Overview of the INEX 2009 Link the Wiki Track Wei Che (Darren) Huang 1, Shlomo Geva 2 and Andrew Trotman 3 Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia 1,

More information

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science

More information

In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology.

In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology. Guide to and Hi everybody! In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology. This guide focuses on two of those symbols: and. These symbols represent concepts

More information

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique R D Neal, R J Shaw and A S Atkins Faculty of Computing, Engineering and Technology, Staffordshire University, Stafford

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Abstract. We define an origin relationship as follows, based on [12].

Abstract. We define an origin relationship as follows, based on [12]. When Functions Change Their Names: Automatic Detection of Origin Relationships Sunghun Kim, Kai Pan, E. James Whitehead, Jr. Dept. of Computer Science University of California, Santa Cruz Santa Cruz, CA

More information

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016 edestrian Detection Using Correlated Lidar and Image Data EECS442 Final roject Fall 2016 Samuel Rohrer University of Michigan rohrer@umich.edu Ian Lin University of Michigan tiannis@umich.edu Abstract

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

A Session-based Ontology Alignment Approach for Aligning Large Ontologies

A Session-based Ontology Alignment Approach for Aligning Large Ontologies Undefined 1 (2009) 1 5 1 IOS Press A Session-based Ontology Alignment Approach for Aligning Large Ontologies Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University,

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Automatic Identification of Important Clones for Refactoring and Tracking

Automatic Identification of Important Clones for Refactoring and Tracking Automatic Identification of Important Clones for Refactoring and Tracking Manishankar Mondal Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {mshankar.mondal,

More information

CS299 Detailed Plan. Shawn Tice. February 5, The high-level steps for classifying web pages in Yioop are as follows:

CS299 Detailed Plan. Shawn Tice. February 5, The high-level steps for classifying web pages in Yioop are as follows: CS299 Detailed Plan Shawn Tice February 5, 2013 Overview The high-level steps for classifying web pages in Yioop are as follows: 1. Create a new classifier for a unique label. 2. Train it on a labelled

More information

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include Outline Computer Science 331 Correctness of Algorithms Mike Jacobson Department of Computer Science University of Calgary Lectures #2-4 1 What is a? Applications 2 Recursive Algorithms 3 Final Notes Additional

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

EECS 349 Machine Learning Homework 3

EECS 349 Machine Learning Homework 3 WHAT TO HAND IN You are to submit the following things for this homework: 1. A SINGLE PDF document containing answers to the homework questions. 2. The WELL COMMENTED MATLAB source code for all software

More information

Chapter 3. Set Theory. 3.1 What is a Set?

Chapter 3. Set Theory. 3.1 What is a Set? Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

From Whence It Came: Detecting Source Code Clones by Analyzing Assembler

From Whence It Came: Detecting Source Code Clones by Analyzing Assembler From Whence It Came: Detecting Source Code Clones by Analyzing Assembler Ian J. Davis and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada

More information

Cost-sensitive Boosting for Concept Drift

Cost-sensitive Boosting for Concept Drift Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems

More information

Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching

Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching Henry Lin Division of Computer Science University of California, Berkeley Berkeley, CA 94720 Email: henrylin@eecs.berkeley.edu Abstract

More information

Data Science with R Decision Trees with Rattle

Data Science with R Decision Trees with Rattle Data Science with R Decision Trees with Rattle Graham.Williams@togaware.com 9th June 2014 Visit http://onepager.togaware.com/ for more OnePageR s. In this module we use the weather dataset to explore the

More information

Motion Detection Algorithm

Motion Detection Algorithm Volume 1, No. 12, February 2013 ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Motion Detection

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

Hardware versus software

Hardware versus software Logic 1 Hardware versus software 2 In hardware such as chip design or architecture, designs are usually proven to be correct using proof tools In software, a program is very rarely proved correct Why?

More information

Knowledge Engineering in Search Engines

Knowledge Engineering in Search Engines San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Knowledge Engineering in Search Engines Yun-Chieh Lin Follow this and additional works at:

More information

EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS

EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS Moon-Soo Lee, Yeon-June Choi, Min-Jeong Kim, Oh-Chun, Kwon Telematics S/W Platform Team, Telematics Research Division Electronics and Telecommunications

More information

BaggTaming Learning from Wild and Tame Data

BaggTaming Learning from Wild and Tame Data BaggTaming Learning from Wild and Tame Data Wikis, Blogs, Bookmarking Tools - Mining the Web 2.0 Workshop @ECML/PKDD2008 Workshop, 15/9/2008 Toshihiro Kamishima, Masahiro Hamasaki, and Shotaro Akaho National

More information

Exploring Similarity Measures for Biometric Databases

Exploring Similarity Measures for Biometric Databases Exploring Similarity Measures for Biometric Databases Praveer Mansukhani, Venu Govindaraju Center for Unified Biometrics and Sensors (CUBS) University at Buffalo {pdm5, govind}@buffalo.edu Abstract. Currently

More information

Attributes as Operators (Supplementary Material)

Attributes as Operators (Supplementary Material) In Proceedings of the European Conference on Computer Vision (ECCV), 2018 Attributes as Operators (Supplementary Material) This document consists of supplementary material to support the main paper text.

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

CREATIVE ASSERTION AND CONSTRAINT METHODS FOR FORMAL DESIGN VERIFICATION

CREATIVE ASSERTION AND CONSTRAINT METHODS FOR FORMAL DESIGN VERIFICATION CREATIVE ASSERTION AND CONSTRAINT METHODS FOR FORMAL DESIGN VERIFICATION Joseph Richards SGI, High Performance Systems Development Mountain View, CA richards@sgi.com Abstract The challenges involved in

More information

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

Chapter Fourteen Bonus Lessons: Algorithms and Efficiency

Chapter Fourteen Bonus Lessons: Algorithms and Efficiency : Algorithms and Efficiency The following lessons take a deeper look at Chapter 14 topics regarding algorithms, efficiency, and Big O measurements. They can be completed by AP students after Chapter 14.

More information

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month Lecture 1 Q: Which month has the lowest sale? Q:There are three consecutive months for which sale grow. What are they? Q: Which month experienced the biggest drop in sale? Q: Just above November there

More information

Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides

Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides for both problems first, and let you guys code them

More information

Supervised classification of law area in the legal domain

Supervised classification of law area in the legal domain AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

Solved Question Paper June 2017

Solved Question Paper June 2017 Solved Question Paper June 2017 1.a) What are the benefits of Object Oriented Methodology in real life applications? Briefly explain each element of the state diagram with respect to dynamic modeling.

More information

CHAPTER 6 QUANTITATIVE PERFORMANCE ANALYSIS OF THE PROPOSED COLOR TEXTURE SEGMENTATION ALGORITHMS

CHAPTER 6 QUANTITATIVE PERFORMANCE ANALYSIS OF THE PROPOSED COLOR TEXTURE SEGMENTATION ALGORITHMS 145 CHAPTER 6 QUANTITATIVE PERFORMANCE ANALYSIS OF THE PROPOSED COLOR TEXTURE SEGMENTATION ALGORITHMS 6.1 INTRODUCTION This chapter analyzes the performance of the three proposed colortexture segmentation

More information

CS103 Spring 2018 Mathematical Vocabulary

CS103 Spring 2018 Mathematical Vocabulary CS103 Spring 2018 Mathematical Vocabulary You keep using that word. I do not think it means what you think it means. - Inigo Montoya, from The Princess Bride Consider the humble while loop in most programming

More information

A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines

A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines Jakob Axelsson School of Innovation, Design and Engineering, Mälardalen University, SE-721 23 Västerås, Sweden

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition 2009 0th International Conference on Document Analysis and Recognition Explicit fuzzy modeling of and positioning for handwritten Chinese character recognition Adrien Delaye - Eric Anquetil - Sébastien

More information

Screening Design Selection

Screening Design Selection Screening Design Selection Summary... 1 Data Input... 2 Analysis Summary... 5 Power Curve... 7 Calculations... 7 Summary The STATGRAPHICS experimental design section can create a wide variety of designs

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019

More information

P a g e 1. MathCAD VS MATLAB. A Usability Comparison. By Brian Tucker

P a g e 1. MathCAD VS MATLAB. A Usability Comparison. By Brian Tucker P a g e 1 MathCAD VS MATLAB A Usability Comparison By Brian Tucker P a g e 2 Table of Contents Introduction... 3 Methodology... 3 Tasks... 3 Test Environment... 3 Evaluative Criteria/Rating Scale... 4

More information

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Leonard Paas previously worked as a senior consultant at the Database Marketing Centre of Postbank. He worked on

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

6.001 Notes: Section 4.1

6.001 Notes: Section 4.1 6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications D.A. Karras 1 and V. Zorkadis 2 1 University of Piraeus, Dept. of Business Administration,

More information

Rainforest maths. Australian Mathematics Curriculum Achievement Standards Correlations Foundation year

Rainforest maths. Australian Mathematics Curriculum Achievement Standards Correlations Foundation year Australian Mathematics Curriculum Achievement Standards Correlations Foundation year NUMBER and ALGEBRA ACMNA Establish understanding of the language and processes of counting by naming numbers in sequences,

More information

Random Oracles - OAEP

Random Oracles - OAEP Random Oracles - OAEP Anatoliy Gliberman, Dmitry Zontov, Patrick Nordahl September 23, 2004 Reading Overview There are two papers presented this week. The first paper, Random Oracles are Practical: A Paradigm

More information

Some questions of consensus building using co-association

Some questions of consensus building using co-association Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation I.Ceema *1, M.Kavitha *2, G.Renukadevi *3, G.sripriya *4, S. RajeshKumar #5 * Assistant Professor, Bon Secourse College

More information

An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications

An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications Zhenhai Duan, Kartik Gopalan, Xin Yuan Abstract In this paper we present a detailed study of the behavioral characteristics

More information

CS 224W Final Report Group 37

CS 224W Final Report Group 37 1 Introduction CS 224W Final Report Group 37 Aaron B. Adcock Milinda Lakkam Justin Meyer Much of the current research is being done on social networks, where the cost of an edge is almost nothing; the

More information

LS-OPT : New Developments and Outlook

LS-OPT : New Developments and Outlook 13 th International LS-DYNA Users Conference Session: Optimization LS-OPT : New Developments and Outlook Nielen Stander and Anirban Basudhar Livermore Software Technology Corporation Livermore, CA 94588

More information

CASE BASED REASONING A SHORT OVERVIEW

CASE BASED REASONING A SHORT OVERVIEW CASE BASED REASONING A SHORT OVERVIEW Z. Budimac, V. Kurbalija Institute of Mathematics and Computer Science, Fac. of Science, Univ. of Novi Sad Trg D. Obradovića 4, 21000 Novi Sad, Yugoslavia zjb@im.ns.ac.yu,

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

CS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist

CS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist CS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist In Handout 28, the Guide to Inductive Proofs, we outlined a number of specifc issues and concepts to be mindful about when

More information

Math Search with Equivalence Detection Using Parse-tree Normalization

Math Search with Equivalence Detection Using Parse-tree Normalization Math Search with Equivalence Detection Using Parse-tree Normalization Abdou Youssef Department of Computer Science The George Washington University Washington, DC 20052 Phone: +1(202)994.6569 ayoussef@gwu.edu

More information

Performance Evaluation of XHTML encoding and compression

Performance Evaluation of XHTML encoding and compression Performance Evaluation of XHTML encoding and compression Sathiamoorthy Manoharan Department of Computer Science, University of Auckland, Auckland, New Zealand Abstract. The wireless markup language (WML),

More information

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Analyzing Dshield Logs Using Fully Automatic Cross-Associations Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

The Cheapest Way to Obtain Solution by Graph-Search Algorithms

The Cheapest Way to Obtain Solution by Graph-Search Algorithms Acta Polytechnica Hungarica Vol. 14, No. 6, 2017 The Cheapest Way to Obtain Solution by Graph-Search Algorithms Benedek Nagy Eastern Mediterranean University, Faculty of Arts and Sciences, Department Mathematics,

More information

Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003

Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003 Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003 Graham Cormode graham@dimacs.rutgers.edu dimacs.rutgers.edu/~graham S. Muthukrishnan muthu@cs.rutgers.edu Everyday

More information

Signature Verification Why xyzmo offers the leading solution

Signature Verification Why xyzmo offers the leading solution Dynamic (Biometric) Signature Verification The signature is the last remnant of the hand-written document in a digital world, and is considered an acceptable and trustworthy means of authenticating all

More information

BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks

BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks 1 BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks Pan Hui, Jon Crowcroft, Eiko Yoneki Presented By: Shaymaa Khater 2 Outline Introduction. Goals. Data Sets. Community Detection Algorithms

More information

The Intelligent Process Planner and Scheduler. by Carl P. Thompson Advisor: Jeffrey W. Herrmann, Edward Lin, Mark Fleischer, Vidit Mathur

The Intelligent Process Planner and Scheduler. by Carl P. Thompson Advisor: Jeffrey W. Herrmann, Edward Lin, Mark Fleischer, Vidit Mathur UNDERGRADUATE REPORT The Intelligent Process Planner and Scheduler by Carl P. Thompson Advisor: Jeffrey W. Herrmann, Edward Lin, Mark Fleischer, Vidit Mathur U.G. 2000-1 I R INSTITUTE FOR SYSTEMS RESEARCH

More information