Simple Classification using Binary Data

Size: px

Start display at page:

Download "Simple Classification using Binary Data"

Hilary Allison
5 years ago
Views:

1 Simple Classification using Binary Data Deanna Needell Mathematics UCLA Thanks: NSF CAREER DMS# NSF BIGDATA DMS# MSRI NSF DMS# Alfred P. Sloan Fdn

2 Joint work with Rayan Saab (UCSD) Tina Woolf (CGU)

3 So much data

4 So much data

5 So much data

6 So much data

7 How can we handle all this data? Option 1 : Build bigger computing systems v We need the resources v Fundamental limitations v Wasteful (resources, energy, cost, )

8 How can we handle all this data? 3 MB of internet data transfer = boiling one cup of water (

9 How can we handle all this data? Option 2 : Design more efficient data analysis methods

10 Compressed sensing v Need to solve highly underdetermined linear system va has a null-space!

11 1-bit compressed sensing v Store only the first bit the SIGN of each measurement

12 1-bit compressed sensing v Store only the first bit the SIGN of each measurement

13 1-bit compressed sensing v Store only the first bit the SIGN of each measurement v Remedy: Use dithers to estimate the norm of x

14 Moral: vone-bit (binary) data is as efficient as it gets vit may still contain enough information about the signal to perform inference tasks

15 Problem: classification

16 Problem: reality (Tang etal. 2016)

17 Background: SVM

18 Background: SVM

19 Background: Deep Learning

20 Our goals Design classification scheme that: v Uses binary data v Is simple and efficient v Uses layers in an interpretable way

21 Notation v X : nxp training data matrix (data in columns) v A : mxn (random) matrix v Q = sign(ax) : mxp binary training data v G : # of classes v b : p training labels (1-G) v L : # of layers in our design

22 Main idea v Each row of A corresponds to a hyperplane v Each binary measurement Q ij indicates on which side of the i th hyperplane data point X j lies v If we gather enough of this info for all the training data, we can use it to predict the class label for a new test point x

23 Single layer v All the hyperplane information in Q is enough x For a new point x, simply compare its sign pattern with those of the training points and choose the label it matches the most often

24 Multiple layers v What about? For ANY hyperplane, there are both red and blue on at least one side of it x

25 Multiple layers v Now PAIRS of hyperplanes do the trick x For a new point x, simply compare its sign pattern for hyperplane PAIRS with those of the training points and choose the label it matches the most often

26 Multiple layers What about higher dimensions? v We continue this strategy to build layers, the l th layer corresponding to l tuples of hyperplanes v For simplicity (and computation), we consider m l tuples at each layer, selected randomly from all possible v For a new test point x, we use the sign patterns across all layers for classification

27 Using all layers How to integrate the info from all layers? vl : layer v i : index from 1 to m v : the set of l indices indicating which hyperplanes are selected in the i th l-tuple v t : a possible sign pattern of l +/- 1s v g : a class index (from 1 to G) v P g t : the # of training points from the g th class having sign pattern t from the hyperplanes in

28 Using all layers v P g t : the # of training points from the g th class having sign pattern t from the hyperplanes in

29 Using all layers v P g t : the # of training points from the g th class having sign pattern t from the hyperplanes in fraction of training points in class g out of all points with pattern t

$Using all layers v P g t : the # of training points from the g th class having sign pattern t from the hyperplanes in fraction of training$

30 Using all layers v P g t : the # of training points from the g th class having sign pattern t from the hyperplanes in fraction of training points in class g out of all points with pattern t gives more weight to group g when its size is much different than others with same sign pattern

31 Our method: training v For each layer, pick the l tuples and then compute all values of r(l,i,t,g)

32 Our method: testing v For a new point x with sign pattern t*, compute the sum of all r(l,i,t*,g) for each class g and then assign the label g which has the largest sum.

33 Results (L=1)

34 Results (L=4)

35 Results (L=5)

36 Results

37 Results

38 Results (L=1)

39 Results (L=4)

40 Results (L=5)

41 Results (L=5)

42 Theory

43 Theory A g t : the angle of class g with sign pattern t for the i th l tuple in layer l

44 Theory

45 Theory vas m à, P(correct label) à 1

46 Theory

Take-away v Simple classification from binary data v Efficient storage of the data v Efficient and simple algorithm v Theoretical analysis possible v

47 Take-away v Simple classification from binary data v Efficient storage of the data v Efficient and simple algorithm v Theoretical analysis possible v Already competes with state of the art v Future work v Dithers to allow for more complicated geometries? v Theoretical analysis of the discrete case?

48 math.ucla.edu/~deanna Simple Classification using Binary Data Needell, Saab, Woolf

Classifying Depositional Environments in Satellite Images

Classifying Depositional Environments in Satellite Images Alex Miltenberger and Rayan Kanfar Department of Geophysics School of Earth, Energy, and Environmental Sciences Stanford University 1 Introduction