Computational Statistics and Mathematics for Cyber Security
|
|
- Josephine Miller
- 5 years ago
- Views:
Transcription
1 and Mathematics for Cyber Security David J. Marchette Sept, 0 Acknowledgment: This work funded in part by the NSWC In-House Laboratory Independent Research (ILIR) program. NSWCDD-PN--00
2 Topics NSWCDD-PN--00
3 Topics NSWCDD-PN--00
4 Take-Away Points Mathematics and statistics provide many tools for cyber security. Simple can be powerful. Complicated models or algorithms are not always necessary. Sometimes they are. Complicated things become simple with familiarity. High dimensional data is complicated, messy, and can fool you. Know your data! If your results appear too good to be true, triple check them! NSWCDD-PN--00
5 Two Cultures There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. * There are many aspects of this dichotomy: Modeling algorithms. Parametric non-parametric. Statistics machine learning. Inference prediction.** Small data big data. Traditional statistics computational statistics. *Leo Breiman, Statistical Science 00, Vol., No., **Donoho, D. (0, September). 0 years of Data Science. In Princeton NJ, Tukey Centennial Workshop. accessed //0 NSWCDD-PN--00
6 The Illusion of Progress... [comparative studies] often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. * Simple models often produce essentially the same accuracy as more complicated models. These can be easier to understand, fit, and may have fewer parameters to choose possibly resulting in lower variance. The data you get is rarely (if ever) a true random draw from the distribution you will be running your trained/implemented algorithm on. This is particularly important in cyber security. By its nature, cyber security data is non-stationary, and today s data may look very different from tomorrow s. *David Hand, Statistical Science 00, Vol., No., NSWCDD-PN--00
7 The Illusion of Progress When building a model, one makes assumptions, which are often not testable, and which can impact the ultimate performance. Simpler models (may) have fewer assumptions. Non-parametric (may) be superior to parametric in that they (tend to) make fewer assumptions. However, if the assumptions are true, parametric may be superior. Good non-parametric algorithms would be nearly as good as the parametric, while allowing a hedge on the assumptions. Hand suggests we spend less time developing the next great classifier and more time on methods that mitigate the above issues. NSWCDD-PN--00
8 Outline Probability density estimation. Kernel estimators. Streaming data. Machine learning. Nearest neighbors. Random forests. Manifold learning. Graphs. Spectral embedding.. We ll see how much of this we can cover today see the paper. NSWCDD-PN--00
9 Topics NSWCDD-PN--00
10 The Histogram Density NSWCDD-PN--00
11 The Histogram The Kernel Estimator Density NSWCDD-PN--00
12 The Kernel Estimator f (x) = n n i= K h (x x i ) = nh n ( x xi φ h i= ). Easily extended to multivariate versions. Note that this is an average. NSWCDD-PN--00
13 Network Flows NSWCDD-PN--00
14 Network Flows NSWCDD-PN--00
15 Streaming Data Averages can be computed in a streaming fashion: X n = n n X n + n X n. We can implement an exponential window: X n = N N X n + N X n = θ X n + ( θ)x n, and apply this idea to the kernel estimator: f n (x) = θ f n (x) + ( θ)φ ( x Xn h ). θ controls how much of the past we remember. Note that we have to set a grid of x points at which we want to compute f. NSWCDD-PN--00
16 Streaming Network Flows: log(#bytes) in a Flow NSWCDD-PN--00
17 Topics NSWCDD-PN--00
18 : Classification Given {(x i, y i )} i=,...,n X Y with x i corresponding to observations (flows, programs, , system calls, log files: features ), and y i corresponding to class labels (e.g. malware, benign ). A classifier is a mapping g : X Y. Machine learning (pattern recognition, classification) is designing a function g from training data {(x i, y i )} i=,...,n for which truth is known. We are given training data {(x i, y i )} i=,...,n X Y, and will be presented with a new x X for which the label is unknown. We wish to infer the y associated with x. NSWCDD-PN--00
19 Nearest Neighbors We are given training data {(x i, y i )} i=,...,n X Y, and a new x X for which the label is unknown. Find the closest x i to x: ŷ = y arg min d(x,xi ). We must select an appropriate distance (dissimilarity) d. Alternative: We can compute the k closest, and vote: take the majority class. NSWCDD-PN--00
20 Kaggle Malware 0, examples of malware grouped into malware families.* Each file has been byte-dumped and tabulated: We are using the frequency of times each value 0,..., occurs in the file. This seems really dumb (computer scientists laugh when I tell this story). We ll look at the nearest neighbor classifier on these data. 00 observations of each family are used for training ( observations from the family containing only observations). Test on the remaining. Remember: sometimes simple is good. * NSWCDD-PN--00
21 Kaggle Malware: NN Performance True Class Error:.%. That is, % of the observations are correctly classified. NSWCDD-PN--00
22 Kaggle Malware: NN Performance Why??? Text analogy: byte-count histogram is analogous to the word-count histogram used in text analysis. Maybe this is more like a morpheme-count histogram. Intuitively, a family shares a core of code (they are modifications of the mother malware). The bytes correspond to machine instructions or at least they would if we were counting words instead of bytes. NSWCDD-PN--00
23 Kaggle Malware: Smoothed NN Performance Using the kernel estimator instead of the histogram, one obtains an error of.%. This is another place for computer scientists to laugh: bytes are not continuous, machine instruction codes are discrete.... and yet it works. Remember Hand s paper. Here is the point at which we need to better understand our data. Unfortunately, we won t be doing this today. NSWCDD-PN--00
24 Random Forests We are given training data {(x i, y i )} i=,...,n X Y, and a new x X for which the label is unknown. The random forest is an ensemble of decision trees: Sample (with replacement) from the training data. Sample a subset of the variables. Build a decision tree using the two samples don t bother with any optimization or pruning. Repeat. With a new observation, vote the trees. NSWCDD-PN--00
25 Benign vs Malicious observations of windows binaries: 0 benign, malicious. Random forest performance: 0.% error..% of benign misclassified. 0.% of malicious misclassified. Nearest neighbor classifier is a little worse: overall error of.%. NSWCDD-PN--00
26 Know Your Data The results demonstrate that there is something going on with this byte-count approach. Logically, the performance seems too good to be true, and yet it does seem to work. The data are high dimensional (), so maybe there is a curse of dimensionality thing going on here. Perhaps we are finding OS-specific things: The data collected for the benign files may be a different version of the operating system than the malicious. We don t have version information about the data (beyond these are Windows files). Worrisome fact: there are several different sets of benign (or malicious) data. A classifier can be built to tell which set which of the benign collections a file belongs to. NSWCDD-PN--00
27 Know Your Data? Maaten & Hinton (00). Visualizing data using t-sne. Journal of Research,, -0. NSWCDD-PN--00
28 Topics NSWCDD-PN--00
29 Hypothesis: high dimensional data lives on a lower dimensional structure. Manifold learning is a set of techniques to infer this structure, or to embed the data from the high dimensional space into a lower dimensional space that respects the local structure. NSWCDD-PN--00
30 Multidimensional Scaling Problem: Given a distance matrix (or dissimilarity matrix) D, find a set of points X R d whose distance d(x ) best approximates D. This is the problem solved by multidimensional scaling (MDS). Different definitions of best approximates lead to different algorithms. Classical MDS utilizes the eigenvector decomposition of (a modified version of) the distance matrix. Some manifold learning algorithms compute a local distance and use MDS, others computer eigenvectors of related matrices. These are the algorithms I use most often. NSWCDD-PN--00
31 Basic Graph Theory A graph is a set V of vertices, and E of pairs of vertices (edges). The edges can be directed or undirected, and can have weights. In this talk they will be undirected. The (graph) distance between two vertices is the length of the shortest path between them in the graph. The adjacency matrix of a graph on n vertices is the n n binary matrix with a in those positions corresponding to the edges of the graph. The spectrum of a graph is the eigen decomposition of the adjacency matrix A, or more generally, of some function f (A). NSWCDD-PN--00
32 Graph Examples ɛ-ball graph with ɛ = 0.. -nearest neighbor graph. NSWCDD-PN--00
33 Basic Steps of Given data {x,..., x n } R p : Construct a graph whose vertices are the x i with edges between near points. k-nearest neighbor graph. ɛ-ball graph. Variations. Compute the eigenvectors of: The adjacency matrix. The Laplacian of the adjacency matrix. Scaled or modified versions of the above. Set Z to the matrix with columns corresponding to the main eigenvectors. That is, the rows {z,..., z n } are the embedded data. Perform inference on Z. NSWCDD-PN--00
34 Compute ɛ-ball graph on the Kaggle training data. Layout the graph. Embed using scaled Laplacian. Embed using adjacency matrix. Embed using MDS on graph distance. NSWCDD-PN--00
35 Compute ɛ-ball graph on the Kaggle training data. Layout the graph. Embed using scaled Laplacian. Embed using adjacency matrix. Embed using MDS on graph distance. NSWCDD-PN--00
36 Compute ɛ-ball graph on the Kaggle training data. Layout the graph. Embed using scaled Laplacian. Embed using adjacency matrix. Embed using MDS on graph distance. NSWCDD-PN--00
37 Compute ɛ-ball graph on the Kaggle training data. Layout the graph. Embed using scaled Laplacian. Embed using adjacency matrix. Embed using MDS on graph distance. NSWCDD-PN--00
38 Discussion Different embedding methods extract different information about the data. These two dimensional plots are misleading in that there is no reason to assume the intrinsic dimensionality is. Some care must be taken to ensure that the embedding method can be applied to new data. NSWCDD-PN--00
39 Joint Embedding ( D W D W = W D ) Jointly embed D and D using D W, where W = λd + ( λ)d. NSWCDD-PN--00
40 Topics NSWCDD-PN--00
41 (TDA) The basic idea is to use topological features measures that are invariant to smooth deformations to learn about the structure of the data. We will only be able to touch briefly on this subject. See: Carlsson, Topology and Data, Bulletin of the American Mathematical Society,, 00, 0. Ghrist, Elementary Applied Topology, Createspace Independent Publishing Platform, 0. NSWCDD-PN--00
42 Simplices A (geometric) simplex of dimension d is a set of d + points in relative position. A 0 simplex is a point, a -simplex a line segment, a simplex a triangle, and so on. NSWCDD-PN--00
43 Simplicial Complexes A simplicial complex is a collection S of simplices that satisfies the following conditions: If σ S then so are the faces of σ. If σ, σ S are k simplices, then either they are disjoint or they intersect in a lower dimensional simplex which is a face of both. NSWCDD-PN--00
44 Persistent Homology We construct an ɛ-ball graph on the data, and from this we get a simplicial complex. We compute a measure of the topology (the rank of the Homology, or the Betti number) how many d-dimensional holes are there? Those structures that persist across ranges of ɛ are interesting and more likely to be real structure rather than noise. NSWCDD-PN--00
45 Euler Characteristic One defines the Euler characteristic as: χ(x ) = n ( ) j Betti j (X ). j=0 This is equivalent to the standard Euler characteristic one learns in grade school, extended to general topological spaces and higher dimensions. The persistent version is to compute this on the persistent homologies from the ɛ-ball graphs. NSWCDD-PN--00
46 Persistent Euler Characteristics of Malware NSWCDD-PN--00
47 Discussion Mathematics has many tools for the data analyst, in particular for the analysis of cyber data. These tools include: Computational statistics. Machine learning. Graph theory. Manifold learning. Topological data analysis. New applications of pure mathematics to data analysis are developed every day, and these areas are all huge growth areas for applied mathematicians. NSWCDD-PN--00
Topological Classification of Data Sets without an Explicit Metric
Topological Classification of Data Sets without an Explicit Metric Tim Harrington, Andrew Tausz and Guillaume Troianowski December 10, 2008 A contemporary problem in data analysis is understanding the
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationDiffusion Maps and Topological Data Analysis
Diffusion Maps and Topological Data Analysis Melissa R. McGuirl McGuirl (Brown University) Diffusion Maps and Topological Data Analysis 1 / 19 Introduction OVERVIEW Topological Data Analysis The use of
More informationTopological Data Analysis
Topological Data Analysis Deepak Choudhary(11234) and Samarth Bansal(11630) April 25, 2014 Contents 1 Introduction 2 2 Barcodes 2 2.1 Simplical Complexes.................................... 2 2.1.1 Representation
More informationMapper, Manifolds, and More! Topological Data Analysis and Mapper
Mapper, Manifolds, and More! Topological Data Analysis and Mapper Matt Piekenbrock Data Science and Security Cluster (DSSC) Talk: March 28th, 2018 Topology Topology is a branch of mathematics concerned
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationAnalysis of high dimensional data via Topology. Louis Xiang. Oak Ridge National Laboratory. Oak Ridge, Tennessee
Analysis of high dimensional data via Topology Louis Xiang Oak Ridge National Laboratory Oak Ridge, Tennessee Contents Abstract iii 1 Overview 1 2 Data Set 1 3 Simplicial Complex 5 4 Computation of homology
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationTopological Data Analysis - I. Afra Zomorodian Department of Computer Science Dartmouth College
Topological Data Analysis - I Afra Zomorodian Department of Computer Science Dartmouth College September 3, 2007 1 Acquisition Vision: Images (2D) GIS: Terrains (3D) Graphics: Surfaces (3D) Medicine: MRI
More informationCSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!
More informationNon-Parametric Modeling
Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor
More informationTopological estimation using witness complexes. Vin de Silva, Stanford University
Topological estimation using witness complexes, Acknowledgements Gunnar Carlsson (Mathematics, Stanford) principal collaborator Afra Zomorodian (CS/Robotics, Stanford) persistent homology software Josh
More informationPersistent Homology for Characterizing Stimuli Response in the Primary Visual Cortex
Persistent Homology for Characterizing Stimuli Response in the Primary Visual Cortex Avani Wildani Tatyana O. Sharpee (PI) ICML Topology 6/25/2014 The big questions - How do we turn sensory stimuli into
More informationCSC 411: Lecture 05: Nearest Neighbors
CSC 411: Lecture 05: Nearest Neighbors Raquel Urtasun & Rich Zemel University of Toronto Sep 28, 2015 Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 1 / 13 Today Non-parametric models
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationTopology and the Analysis of High-Dimensional Data
Topology and the Analysis of High-Dimensional Data Workshop on Algorithms for Modern Massive Data Sets June 23, 2006 Stanford Gunnar Carlsson Department of Mathematics Stanford University Stanford, California
More informationDimension Reduction of Image Manifolds
Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationChallenges motivating deep learning. Sargur N. Srihari
Challenges motivating deep learning Sargur N. srihari@cedar.buffalo.edu 1 Topics In Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationExploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019
Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data
More informationStratified Structure of Laplacian Eigenmaps Embedding
Stratified Structure of Laplacian Eigenmaps Embedding Abstract We construct a locality preserving weight matrix for Laplacian eigenmaps algorithm used in dimension reduction. Our point cloud data is sampled
More informationCS 534: Computer Vision Segmentation and Perceptual Grouping
CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation
More informationOn Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions
On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition
More informationCS6716 Pattern Recognition
CS6716 Pattern Recognition Prototype Methods Aaron Bobick School of Interactive Computing Administrivia Problem 2b was extended to March 25. Done? PS3 will be out this real soon (tonight) due April 10.
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationClustering algorithms and introduction to persistent homology
Foundations of Geometric Methods in Data Analysis 2017-18 Clustering algorithms and introduction to persistent homology Frédéric Chazal INRIA Saclay - Ile-de-France frederic.chazal@inria.fr Introduction
More informationTopic 6 Representation and Description
Topic 6 Representation and Description Background Segmentation divides the image into regions Each region should be represented and described in a form suitable for further processing/decision-making Representation
More informationAM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.
AM205: lecture 2 Luna and Gary will hold a Python tutorial on Wednesday in 60 Oxford Street, Room 330 Assignment 1 will be posted this week Chris will hold office hours on Thursday (1:30pm 3:30pm, Pierce
More informationMassive Data Analysis
Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationTOPOLOGICAL DATA ANALYSIS
TOPOLOGICAL DATA ANALYSIS BARCODES Ghrist, Barcodes: The persistent topology of data Topaz, Ziegelmeier, and Halverson 2015: Topological Data Analysis of Biological Aggregation Models 1 Questions in data
More informationSubspace Clustering. Weiwei Feng. December 11, 2015
Subspace Clustering Weiwei Feng December 11, 2015 Abstract Data structure analysis is an important basis of machine learning and data science, which is now widely used in computational visualization problems,
More informationObserving Information: Applied Computational Topology.
Observing Information: Applied Computational Topology. Bangor University, and NUI Galway April 21, 2008 What is the geometric information that can be gleaned from a data cloud? Some ideas either already
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationBeyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University
Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University Why Image Retrieval? World Wide Web: Millions of hosts Billions of images Growth of video
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationSYMBOLIC FEATURES IN NEURAL NETWORKS
SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract:
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationText Modeling with the Trace Norm
Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to
More informationRelative Constraints as Features
Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationCellular Tree Classifiers. Gérard Biau & Luc Devroye
Cellular Tree Classifiers Gérard Biau & Luc Devroye Paris, December 2013 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationWe use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.
The Bayes Classifier We have been starting to look at the supervised classification problem: we are given data (x i, y i ) for i = 1,..., n, where x i R d, and y i {1,..., K}. In this section, we suppose
More informationEmpirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the
More informationSpectral Surface Reconstruction from Noisy Point Clouds
Spectral Surface Reconstruction from Noisy Point Clouds 1. Briefly summarize the paper s contributions. Does it address a new problem? Does it present a new approach? Does it show new types of results?
More informationMachine Learning and Pervasive Computing
Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)
More informationRandom Simplicial Complexes
Random Simplicial Complexes Duke University CAT-School 2015 Oxford 10/9/2015 Part III Extensions & Applications Contents Morse Theory for the Distance Function Persistent Homology and Maximal Cycles Contents
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationRandom Simplicial Complexes
Random Simplicial Complexes Duke University CAT-School 2015 Oxford 8/9/2015 Part I Random Combinatorial Complexes Contents Introduction The Erdős Rényi Random Graph The Random d-complex The Random Clique
More informationEvgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:
Today Problems with visualizing high dimensional data Problem Overview Direct Visualization Approaches High dimensionality Visual cluttering Clarity of representation Visualization is time consuming Dimensional
More informationSegmentation: Clustering, Graph Cut and EM
Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu
More informationCourtesy of Prof. Shixia University
Courtesy of Prof. Shixia Liu @Tsinghua University Outline Introduction Classification of Techniques Table Scatter Plot Matrices Projections Parallel Coordinates Summary Motivation Real world data contain
More informationLab 9. Julia Janicki. Introduction
Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationNearest Neighbors Classifiers
Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,
More informationComputer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis
Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationGetting Students Excited About Learning Mathematics
Getting Students Excited About Learning Mathematics Introduction Jen Mei Chang Department of Mathematics and Statistics California State University, Long Beach jchang9@csulb.edu It wasn t so long ago when
More informationIntroduction to Machine Learning
Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 19 Outline
More informationSpectral Clustering and Community Detection in Labeled Graphs
Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu
More informationIntroduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones
Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses
More informationTopic: Orientation, Surfaces, and Euler characteristic
Topic: Orientation, Surfaces, and Euler characteristic The material in these notes is motivated by Chapter 2 of Cromwell. A source I used for smooth manifolds is do Carmo s Riemannian Geometry. Ideas of
More informationChapter 1. Introduction
Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of
More informationTopological Issues in Hexahedral Meshing
Topological Issues in Hexahedral Meshing David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science Outline I. What is meshing? Problem statement Types of mesh Quality issues
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationMULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER
MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationWestern TDA Learning Seminar. June 7, 2018
The The Western TDA Learning Seminar Department of Mathematics Western University June 7, 2018 The The is an important tool used in TDA for data visualization. Input point cloud; filter function; covering
More informationOn the Topology of Finite Metric Spaces
On the Topology of Finite Metric Spaces Meeting in Honor of Tom Goodwillie Dubrovnik Gunnar Carlsson, Stanford University June 27, 2014 Data has shape The shape matters Shape of Data Regression Shape of
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationtopological data analysis and stochastic topology yuliy baryshnikov waikiki, march 2013
topological data analysis and stochastic topology yuliy baryshnikov waikiki, march 2013 Promise of topological data analysis: extract the structure from the data. In: point clouds Out: hidden structure
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More information