Predicting Vulnerable Software Components

Size: px
Start display at page:

Download "Predicting Vulnerable Software Components"

Transcription

1 Predicting Vulnerable Software Components Stephan Neuhaus, et. al. 10/29/2008 Stuart A Jaskowiak, CSC 682 1

2 What's in the paper? Introduction Scope of this Work Components and Vulnerabilities Imports and Functions Matter Predicting Vulnerabilities from Features Case Study: Mozilla Related Work Conclusions and Future Work 10/29/2008 Stuart A Jaskowiak, CSC 682 2

3 Introduction Designed Vulture to mine vulnerability database to associate vulnerabilities with individual components 10/29/2008 Stuart A Jaskowiak, CSC 682 3

4 Why Vulture? Vulnerable components tend to share function calls In Mozilla, 14 components import nsnodeutils.h, 13 components (93%) had to be patched because of security leaks All 15 that import nsicontent.h, nsiinterface- RequestorUtils.h and nscontentutils.h together are vulnerable! 10/29/2008 Stuart A Jaskowiak, CSC 682 4

5 Scope Author's Hypothesis is not the import or function but the cause of the vulnerability is the import s or function s domain 10/29/2008 Stuart A Jaskowiak, CSC 682 5

6 Upfront Validity Concerns Study Size Bugs in the database or the code Bugs in R Library Wrong or noisy input data Yet unknown vulnerabilites 10/29/2008 Stuart A Jaskowiak, CSC 682 6

7 Components and Vulnerabilities Component an entity in a software product that may have issues Vulnerability - a defect in one or more components that manifests itself as some violation of a security policy Use Bugzilla to map bug ids to components 10/29/2008 Stuart A Jaskowiak, CSC 682 7

8 Mozilla Components and Vulnerabilities As of 1/4/2007, Mozilla has 13,111 C/C++ files 10,452 components 134 vulnerability advisories 302 bug reports Of the 10,452 components, 424 (4.05%) were vulnerable 10/29/2008 Stuart A Jaskowiak, CSC 682 8

9 Top Ten Mozilla Vulnerabilities First 4 are all Javascript related 10/29/2008 Stuart A Jaskowiak, CSC 682 9

10 Imports and Functions Matter Imports #include <name> #include name #include NAME Function Calls are harder C/C++ need preprocessing to determine if calls are made 10/29/2008 Stuart A Jaskowiak, CSC

11 Features in Mozilla In Mozilla 79,494 of form component x imports import y 9,481 distinct imports 324,822 of form component x calls function y 92,265 distinct function names 10/29/2008 Stuart A Jaskowiak, CSC

12 Predicting Vulnerabilities from Features How vulnerable is a new component? 10/29/2008 Stuart A Jaskowiak, CSC

13 Precision and Recall for Vulture Precision is TP/(TP + FP), Recall is TP/ (TP+FN) 10/29/2008 Stuart A Jaskowiak, CSC

14 Case Study: Mozilla How fast? 10/29/2008 Stuart A Jaskowiak, CSC

15 Case Study: Mozilla (cont) How accurate? For imports: Of all vulnerable components, Vulture flags 45% as vulnerable For function calls: Of all components flagged as vulnerable, 70% actually are vulnerable. Vulture is much better than random selection. Among the top 30 predicted components, Vulture finds 82% of all vulnerabilities. 10/29/2008 Stuart A Jaskowiak, CSC

16 Case Study: Mozilla (cont) Vulture vs Actual Bugs 10/29/2008 Stuart A Jaskowiak, CSC

17 Related Work that lead to Vulture Looking at components histories Evolution of defect numbers Estimating the number of vulnerabilities Testing the binary (Statically) examining the source Hardening the source or runtime enviroment 10/29/2008 Stuart A Jaskowiak, CSC

18 Conclusions A technique for mapping past vulnerabilities by mining and combining vulnerability databases with version archives. Empirical evidence that contradicts popular wisdom saying that vulnerable components will generally have more vulnerabilities in the future. Evidence that features correlate with vulnerabilities. 10/29/2008 Stuart A Jaskowiak, CSC

19 Conclusions (cont) A tool that learns from the locations of past vulnerabilities to predict future ones with reasonable accuracy. An approach for identifying vulnerabilities that automatically adapts to specific projects and products. A predictor for vulnerabilities that only needs a set of suitable features, and thus can be applied before the component is fully implemented. 10/29/2008 Stuart A Jaskowiak, CSC

Predicting Vulnerable Software Components

Predicting Vulnerable Software Components Predicting Vulnerable Software Components Stephan Neuhaus, Thomas Zimmermann, Andreas Zeller Saarland University, Saarbrücken, Germany {neuhaus, zimmerth, zeller}@cs.uni-sb.de Abstract We introduce Vulture,

More information

An Empirical Study of Architectural Decay in Open-Source Software

An Empirical Study of Architectural Decay in Open-Source Software An Empirical Study of Architectural Decay in Open-Source Software Duc M. Le Annual Research Review 4/2017 Center for Systems and Software Engineering Motivation Explicitly manifest as Issues Problems Implicitly

More information

Software Maintainability Ontology in Open Source Software. Celia Chen ARR 2018, USC

Software Maintainability Ontology in Open Source Software. Celia Chen ARR 2018, USC Software Maintainability Ontology in Open Source Software Celia Chen qianqiac@usc.edu ARR 2018, USC How do others measure software maintainability? Most popular methods: Automated analysis of the code

More information

Relating Software Coupling Attribute and Security Vulnerability Attribute

Relating Software Coupling Attribute and Security Vulnerability Attribute Relating Software Coupling Attribute and Security Vulnerability Attribute Varadachari S. Ayanam, Frank Tsui, Sheryl Duggins, Andy Wang Southern Polytechnic State University Marietta, Georgia 30060 Abstract:

More information

Can Complexity, Coupling, and Cohesion Metrics be Used as Early Indicators of Vulnerabilities?

Can Complexity, Coupling, and Cohesion Metrics be Used as Early Indicators of Vulnerabilities? Can Complexity, Coupling, and Cohesion Metrics be Used as Early Indicators of Vulnerabilities? Istehad Chowdhury Dept. of Electrical & Computer Eng. Queen s University, Kingston Ontario, Canada, K7L3N6

More information

A Model of Large Software Development

A Model of Large Software Development A Model of Large Software Development L.A Belady and M.M Lehman 1976 IBM OS/360 Seminal empirical study paper in software evolution What was it like in 1976? computers were huge computers were slow no

More information

Cross-project defect prediction. Thomas Zimmermann Microsoft Research

Cross-project defect prediction. Thomas Zimmermann Microsoft Research Cross-project defect prediction Thomas Zimmermann Microsoft Research Upcoming Events ICSE 2010: http://www.sbs.co.za/icse2010/ New Ideas and Emerging Results ACM Student Research Competition (SRC) sponsored

More information

Empirical Software Engineering. Empirical Software Engineering with Examples! is not a topic for examination. Classification.

Empirical Software Engineering. Empirical Software Engineering with Examples! is not a topic for examination. Classification. Empirical Software Engineering Empirical Software Engineering with Examples is not a topic for examination a sub-domain of software engineering focusing on experiments on software systems devise experiments

More information

Empirical Study on Impact of Developer Collaboration on Source Code

Empirical Study on Impact of Developer Collaboration on Source Code Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra, Sahil Puri and Parul Verma 03 April 2018 Outline Introduction Research Questions Methodology Data Characteristics Analysis

More information

Empirical Software Engineering. Empirical Software Engineering with Examples. Classification. Software Quality. precision = TP/(TP + FP)

Empirical Software Engineering. Empirical Software Engineering with Examples. Classification. Software Quality. precision = TP/(TP + FP) Empirical Software Engineering Empirical Software Engineering with Examples a sub-domain of software engineering focusing on experiments on software systems devise experiments on software, in collecting

More information

TSP Secure. Date: December 14, 2016 William Nichols Carnegie Mellon University

TSP Secure. Date: December 14, 2016 William Nichols Carnegie Mellon University TSP Secure Date: December 14, 2016 William Nichols President's Information Technology Advisory Committee (PITAC), 2005 Commonly used software engineering practices permit dangerous errors, such as improper

More information

Static Analysis of C++ Projects with CodeSonar

Static Analysis of C++ Projects with CodeSonar Static Analysis of C++ Projects with CodeSonar John Plaice, Senior Scientist, GrammaTech jplaice@grammatech.com 25 July 2017, Meetup C++ de Montréal Abstract Static program analysis consists of the analysis

More information

Scalable Statistical Bug Isolation

Scalable Statistical Bug Isolation Post-Deployment Monitoring Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan University of Wisconsin, Stanford University, and UC Berkeley Goal: Measure

More information

Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 機器學習與生物資訊學 Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear Using Machine Learning to Identify Security Issues in Open-Source Libraries Asankhaya Sharma Yaqin Zhou SourceClear Outline - Overview of problem space Unidentified security issues How Machine Learning

More information

USING COMPLEXITY, COUPLING, AND COHESION METRICS AS EARLY INDICATORS OF VULNERABILITIES

USING COMPLEXITY, COUPLING, AND COHESION METRICS AS EARLY INDICATORS OF VULNERABILITIES USING COMPLEXITY, COUPLING, AND COHESION METRICS AS EARLY INDICATORS OF VULNERABILITIES by Istehad Chowdhury A thesis submitted to the Department of Electrical and Computer Engineering in conformity with

More information

Filtering Bug Reports for Fix-Time Analysis

Filtering Bug Reports for Fix-Time Analysis Filtering Bug Reports for Fix-Time Analysis Ahmed Lamkanfi, Serge Demeyer LORE - Lab On Reengineering University of Antwerp, Belgium Abstract Several studies have experimented with data mining algorithms

More information

Software Quality. Debugging " Martin Glinz. Chapter 4. Department of Informatics!

Software Quality. Debugging  Martin Glinz. Chapter 4. Department of Informatics! Department of Informatics! Martin Glinz Software Quality Chapter 4 Debugging " 2014 Martin Glinz. All rights reserved. Making digital or hard copies of all or part of this work for educational, non-commercial

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 7a Andrew Tolmach Portland State University 1994-2016 Values and Types We divide the universe of values according to types A type is a set of values and a

More information

ECLT 5810 Evaluation of Classification Quality

ECLT 5810 Evaluation of Classification Quality ECLT 5810 Evaluation of Classification Quality Reference: Data Mining Practical Machine Learning Tools and Techniques, by I. Witten, E. Frank, and M. Hall, Morgan Kaufmann Testing and Error Error rate:

More information

Predicting Source Code Changes by Mining Revision History

Predicting Source Code Changes by Mining Revision History Predicting Source Code Changes by Mining Revision History Annie T.T. Ying*+, Gail C. Murphy*, Raymond Ng* Dep. of Computer Science, U. of British Columbia* {aying,murphy,rng}@cs.ubc.ca Mark C. Chu-Carroll+

More information

Ensemble Methods: Bagging

Ensemble Methods: Bagging Ensemble Methods: Bagging Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Eric Eaton (UPenn), Jenna Wiens (UMich), Tommi Jaakola (MIT), David Kauchak (Pomona), David Sontag

More information

Lecture 3 Guiding Software Development

Lecture 3 Guiding Software Development Lecture 3 Guiding Software Development Where do you go next? Mentoring! Evolutionay Coupling Research Hypothesis Evolutionay Coupling erose Reengineering of Software Evolution Tom Zimmermann Saarland University

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

The Beauty and the Beast

The Beauty and the Beast The Beauty and the Beast Vulnerabilities in Red Hat s Packages Stephan Neuhaus Thomas Zimmermann Vulnerabilities are important because fixing them

More information

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

An Empirical Study of Vulnerability Rewards Programs

An Empirical Study of Vulnerability Rewards Programs An Empirical Study of Vulnerability Rewards Programs Matthew Finifter, Devdatta Akhawe, David Wagner UC Berkeley security development lifecycle A vulnerability remediation strategy is any systematic approach

More information

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

Search Engines Considered Harmful In Search of an Unbiased Web Ranking Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/45 World-Wide Web 10 years ago With Web

More information

(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315

(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315 (S)LOC Count Evolution for Selected OSS Projects Tik Report 315 Arno Wagner arno@wagner.name December 11, 009 Abstract We measure the dynamics in project code size for several large open source projects,

More information

Lessons Learned in Static Analysis Tool Evaluation. Providing World-Class Services for World-Class Competitiveness

Lessons Learned in Static Analysis Tool Evaluation. Providing World-Class Services for World-Class Competitiveness Lessons Learned in Static Analysis Tool Evaluation 1 Overview Lessons learned in the evaluation of five (5) commercially available static analysis tools Topics Licensing Performance Measurement Limitations

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

Mining Software Repositories. Seminar The Mining Project Yana Mileva & Kim Herzig

Mining Software Repositories. Seminar The Mining Project Yana Mileva & Kim Herzig Mining Software Repositories Seminar 2010 - The Mining Project Yana Mileva & Kim Herzig Predicting Defects for Eclipse [Zimmermann et al.] SCM Repository Predicting Defects for Eclipse [Zimmermann et al.]

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

Commit-Level vs. File-Level Vulnerability Prediction

Commit-Level vs. File-Level Vulnerability Prediction Commit-Level vs. File-Level Vulnerability Prediction by Michael Chong A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science

More information

Cooperative Bug Isolation

Cooperative Bug Isolation Cooperative Bug Isolation Alex Aiken Mayur Naik Stanford University Alice Zheng Michael Jordan UC Berkeley Ben Liblit University of Wisconsin Build and Monitor Alex Aiken, Cooperative Bug Isolation 2 The

More information

OPEN source software systems are becoming evermore

OPEN source software systems are becoming evermore IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 10, OCTOBER 2005 897 Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction Tibor Gyimóthy, Rudolf Ferenc,

More information

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Jaweria Kanwal Quaid-i-Azam University, Islamabad kjaweria09@yahoo.com Onaiza Maqbool Quaid-i-Azam University, Islamabad onaiza@qau.edu.pk

More information

Comparing Software Abstractions Baby Steps. Michael Hansen Lab Lunch Talk 2011

Comparing Software Abstractions Baby Steps. Michael Hansen Lab Lunch Talk 2011 Comparing Software Abstractions Baby Steps Michael Hansen Lab Lunch Talk 2011 Comparing Abstractions Need objective comparison method Libraries (OpenGL vs. Direct3D) Language constructs ( -expressions,

More information

Intranets 4/4/17. IP numbers and Hosts. Dynamic Host Configuration Protocol. Dynamic Host Configuration Protocol. CSC362, Information Security

Intranets 4/4/17. IP numbers and Hosts. Dynamic Host Configuration Protocol. Dynamic Host Configuration Protocol. CSC362, Information Security IP numbers and Hosts Intranets CSC362, Information Security i. IP numbers denote interfaces rather than entities ii. a single router can connect several different networks iii. a single interface can be

More information

Cybersecurity Auditing in an Unsecure World

Cybersecurity Auditing in an Unsecure World About This Course Cybersecurity Auditing in an Unsecure World Course Description $5.4 million that s the average cost of a data breach to a U.S.-based company. It s no surprise, then, that cybersecurity

More information

Predicting Vulnerable Software Components with Dependency Graphs

Predicting Vulnerable Software Components with Dependency Graphs Predicting Vulnerable Software Components with Dependency Graphs Viet Hung Nguyen University of Trento, Italy vhnguyen@disi.unitn.it Le Minh Sang Tran University of Trento, Italy tran@disi.unitn.it ABSTRACT

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

SNS College of Technology, Coimbatore, India

SNS College of Technology, Coimbatore, India Support Vector Machine: An efficient classifier for Method Level Bug Prediction using Information Gain 1 M.Vaijayanthi and 2 M. Nithya, 1,2 Assistant Professor, Department of Computer Science and Engineering,

More information

Information Management Fundamentals by Dave Wells

Information Management Fundamentals by Dave Wells Information Management Fundamentals by Dave Wells All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks

More information

A Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components

A Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components A Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components Ekwa Duala-Ekoko and Martin P. Robillard School of Computer Science, McGill University Montréal, Québec,

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui Projects 1 Information flow analysis for mobile applications 2 2 Machine-learning-guide typestate analysis for UAF vulnerabilities 3 3 Preventing

More information

Pliny and Fixr Meeting. September 15, 2014

Pliny and Fixr Meeting. September 15, 2014 Pliny and Fixr Meeting September 15, 2014 Fixr: Mining and Understanding Bug Fixes for App-Framework Protocol Defects (TA2) University of Colorado Boulder September 15, 2014 Fixr: Mining and Understanding

More information

Lecture 9: July 14, How to Think About Debugging

Lecture 9: July 14, How to Think About Debugging Lecture 9: July 14, 2011 How to Think About Debugging So, you wrote your program. And, guess what? It doesn t work. L Your program has a bug in it Somehow, you must track down the bug and fix it Need to

More information

Identifying Security Critical Properties for the Dynamic Verification of a Processor

Identifying Security Critical Properties for the Dynamic Verification of a Processor Identifying Security Critical Properties for the Dynamic Verification of a Processor Rui Zhang, Natalie Stanley, Christopher Griggs, Andrew Chi, Cynthia Sturton 04-12-2017, ASLOS XI AN CHINA 1 Processor

More information

ACT s College Readiness Standards

ACT s College Readiness Standards Course ACT s College Readiness Standards Select a single piece of data (numerical or nonnumerical) from a simple data presentation (e.g., a table or graph with two or three variables; a food web diagram)

More information

ILP: CONTROL FLOW. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

ILP: CONTROL FLOW. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah ILP: CONTROL FLOW Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 2 will be released on Sept. 26 th This

More information

Verification & Validation of Open Source

Verification & Validation of Open Source Verification & Validation of Open Source 2011 WORKSHOP ON SPACECRAFT FLIGHT SOFTWARE Gordon Uchenick Coverity, Inc Open Source is Ubiquitous Most commercial and proprietary software systems have some open

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Testing Objectives. Successful testing: discovers previously unknown errors

Testing Objectives. Successful testing: discovers previously unknown errors Testing Objectives Informal view: Testing: a process of executing software with the intent of finding errors Good testing: a high probability of finding as-yetundiscovered errors Successful testing: discovers

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc. CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

3 Prioritization of Code Anomalies

3 Prioritization of Code Anomalies 32 3 Prioritization of Code Anomalies By implementing a mechanism for detecting architecturally relevant code anomalies, we are already able to outline to developers which anomalies should be dealt with

More information

Using Static Code Analysis to Find Bugs Before They Become Failures

Using Static Code Analysis to Find Bugs Before They Become Failures Using Static Code Analysis to Find Bugs Before They Become Failures Presented by Brian Walker Senior Software Engineer, Video Product Line, Tektronix, Inc. Pacific Northwest Software Quality Conference,

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

Search Engines Considered Harmful In Search of an Unbiased Web Ranking Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/38 Motivation If you are not indexed by

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Oracle Developer Studio Code Analyzer

Oracle Developer Studio Code Analyzer Oracle Developer Studio Code Analyzer The Oracle Developer Studio Code Analyzer ensures application reliability and security by detecting application vulnerabilities, including memory leaks and memory

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

WEB-CAT. Exploring Trends and Student Behaviors from Data Collected on an Automated Grading and Testing System

WEB-CAT. Exploring Trends and Student Behaviors from Data Collected on an Automated Grading and Testing System Exploring Trends and Student Behaviors from Data Collected on an Automated Grading and Testing System WEB-CAT Tony Allevato, Stephen Edwards allevato@vt.edu, edwards@cs.vt.edu Virginia Tech Department

More information

What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing

What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing The 39th International Conference on Software Engineering What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing Authors: He Jiang 汇报人 1, Xiaochen Li : 1,

More information

Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI)

Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI) Sandboxing Untrusted Code: Software-Based Fault Isolation (SFI) Brad Karp UCL Computer Science CS GZ03 / M030 9 th December 2011 Motivation: Vulnerabilities in C Seen dangers of vulnerabilities: injection

More information

SECURITY RISK METRICS: THE VIEW FROM THE TRENCHES. Alain Mayer CTO, RedSeal Systems

SECURITY RISK METRICS: THE VIEW FROM THE TRENCHES. Alain Mayer CTO, RedSeal Systems SECURITY RISK METRICS: THE VIEW FROM THE TRENCHES Alain Mayer CTO, RedSeal Systems Alain@RedSeal.net Security Defects Defects Vulnerabilities on applications, OS, embedded systems Un-approved applications

More information

How Hard Is Inference for Structured Prediction?

How Hard Is Inference for Structured Prediction? How Hard Is Inference for Structured Prediction? Tim Roughgarden (Stanford University) joint work with Amir Globerson (Tel Aviv), David Sontag (NYU), and Cafer Yildirum (NYU) 1 Structured Prediction structured

More information

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems

More information

Mining API Popularity

Mining API Popularity Mining API Popularity 40 # projects using an API element 35 30 25 20 15 10 5 junit.framework.testsuite javax.swing.jscrollpane javax.swing.border.border junit.framework.assertionfailederror java.io.stringbufferinputstream

More information

MINING SOURCE CODE REPOSITORIES AT MASSIVE SCALE USING LANGUAGE MODELING COMP 5900 X ERIC TORUNSKI DEC 1, 2016

MINING SOURCE CODE REPOSITORIES AT MASSIVE SCALE USING LANGUAGE MODELING COMP 5900 X ERIC TORUNSKI DEC 1, 2016 MINING SOURCE CODE REPOSITORIES AT MASSIVE SCALE USING LANGUAGE MODELING COMP 5900 X ERIC TORUNSKI DEC 1, 2016 OVERVIEW Miltiadis Allamanis, Mining Source Code Repositories at Massive Scale using Language

More information

Appendix to The Health of Software Engineering Research

Appendix to The Health of Software Engineering Research Appendix to The Health of Software Engineering Research David Lo School of Information Systems Singapore Management University Singapore davidlo@smu.edu.sg Nachiappan Nagappan and Thomas Zimmermann Research

More information

The following topics describe how to manage various policies on the Firepower Management Center:

The following topics describe how to manage various policies on the Firepower Management Center: The following topics describe how to manage various policies on the Firepower Management Center: Policy Deployment, page 1 Policy Comparison, page 11 Policy Reports, page 12 Out-of-Date Policies, page

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

Seer: A Lightweight Online Failure Prediction Approach

Seer: A Lightweight Online Failure Prediction Approach 1 Seer: A Lightweight Online Failure Prediction Approach Burcu Ozcelik and Cemal Yilmaz Abstract Online failure prediction approaches aim to predict the manifestation of failures at runtime before the

More information

CSE 403: Software Engineering, Fall courses.cs.washington.edu/courses/cse403/16au/ Static Analysis. Emina Torlak

CSE 403: Software Engineering, Fall courses.cs.washington.edu/courses/cse403/16au/ Static Analysis. Emina Torlak CSE 403: Software Engineering, Fall 2016 courses.cs.washington.edu/courses/cse403/16au/ Static Analysis Emina Torlak emina@cs.washington.edu Outline What is static analysis? How does it work? Free and

More information

SAMATE (Software Assurance Metrics And Tool Evaluation) Project Overview. Tim Boland NIST May 29,

SAMATE (Software Assurance Metrics And Tool Evaluation) Project Overview. Tim Boland NIST May 29, SAMATE (Software Assurance Metrics And Tool Evaluation) Project Overview Tim Boland NIST May 29, 2012 http://samate.nist.gov t.boland@nist.gov 1 NationaI Institute of Standards and Technology (NIST) NIST,

More information

PCI DSS. Compliance and Validation Guide VERSION PCI DSS. Compliance and Validation Guide

PCI DSS. Compliance and Validation Guide VERSION PCI DSS. Compliance and Validation Guide PCI DSS VERSION 1.1 1 PCI DSS Table of contents 1. Understanding the Payment Card Industry Data Security Standard... 3 1.1. What is PCI DSS?... 3 2. Merchant Levels and Validation Requirements... 3 2.1.

More information

(See related materials in textbook.) CSE 435: Software Engineering (slides adapted from Ghezzi et al & Stirewalt

(See related materials in textbook.) CSE 435: Software Engineering (slides adapted from Ghezzi et al & Stirewalt Verification (See related materials in textbook.) Outline What are the goals of verification? What are the main approaches to verification? What kind of assurance do we get through testing? How can testing

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

How to manage evolving threats on evolving ICT assets across Enterprise

How to manage evolving threats on evolving ICT assets across Enterprise How to manage evolving threats on evolving ICT assets across Enterprise Marek Skalicky, CISM, CRISC, Qualys MD for CEE November, 2015 Vaš partner za varovanje informacij Agenda Security STARTs with VISIBILITY

More information

Assertions. Assertions - Example

Assertions. Assertions - Example References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 11/13/2003 1 Assertions Statements about input to a routine or state of a class Have two primary roles As documentation,

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Chapter 10. Conclusion Discussion

Chapter 10. Conclusion Discussion Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with

More information

CSE543 - Computer and Network Security Module: Intrusion Detection

CSE543 - Computer and Network Security Module: Intrusion Detection CSE543 - Computer and Network Security Module: Intrusion Detection Professor Trent Jaeger 1 Intrusion An authorized action... that exploits a vulnerability... that causes a compromise... and thus a successful

More information

The Road Ahead for Mining Software Repositories Ahmed E. Hassan. Queen s University

The Road Ahead for Mining Software Repositories Ahmed E. Hassan. Queen s University The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen s University Canada Sourceforge GoogleCode Code Repos Source Control CVS/SVN Bugzilla Mailing lists Historical Repositories Crash Repos

More information

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES

More information

Lecture 2 Pre-processing Concurrent Versions System Archives

Lecture 2 Pre-processing Concurrent Versions System Archives Lecture 2 Pre-processing Concurrent Versions System Archives Debian Developer Coordinates Distributed Development Distributed Development I hope no one else is editing the same parts of the code as I am.

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

Black Hat Webcast Series. C/C++ AppSec in 2014

Black Hat Webcast Series. C/C++ AppSec in 2014 Black Hat Webcast Series C/C++ AppSec in 2014 Who Am I Chris Rohlf Leaf SR (Security Research) - Founder / Consultant BlackHat Speaker { 2009, 2011, 2012 } BlackHat Review Board Member http://leafsr.com

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised

More information