Unsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction.

Size: px

Start display at page:

Download "Unsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction."

Rodger Bishop
5 years ago
Views:

1 March 19, / 40

2 / 40

3 What s unsupervised learning? Most of the data available on the internet do not have labels. How can we make sense of it? 3 / 40

4 4 / 40

5 5 / 40

6 Organizing the web First attempts to organize the web were based on human curated directories (Yahoo, looksmart). People also used methods from information retrieval to uncover relevant documents. Yet he web has a deluge of untrusted documents, spam, random webpages, advertisements etc. 6 / 40

7 Elements of the PageRank algorithm Solution: Use social feedback to rank the quality of documents. You can see links as vote. A page is more important when it has more incoming links. For instance has numerous incoming notes, as opposed to Links from important questions count more Recursiveness. 7 / 40

8 The iterative PageRank algorithm At t = 0, assume an initial probability distribution: PR(p i ; 0) = 1 N. At each time step, the computation yields: PR(p i ; t + 1) = 1 d N + d p j M(p i ) PR(p j ;t) L(p j ) 8 / 40

9 At t = 0, assume an initial probability distribution: PR(p i ; 0) = 1 N. At each time step, the computation yields: PR(p i ; t + 1) = 1 d N + d PR(p j ;t) p j M(p i ) L(p j ) 9 / 40

10 10 / 40

11 Page Rank Equilibrium 11 / 40

12 PageRank: The spider trap 12 / 40

13 PageRank: The spider trap 13 / 40

14 The Scaled PageRank algorithm Scaled PageRank Update Rule Apply basic PR rule. Scale all values down by factor s. Divide the 1-s leftover units of PR evenly over nodes. 14 / 40

15 What s unsupervised learning? Most of the data available on the internet do not have labels. How can we make sense of it? 15 / 40

16 : the k-means algorithm Input: K, set of points x 1,..., x n Place centroids c 1,..., c k randomly Then repeat until convergence: For each point x i find the nearest centroid c j and assign that point to that cluster In math notation: argmin j D(x i, c j ) For each cluster j = 1,..., K find the new centroid of all points x i assigned to cluster j in previous step. In math notation: c j (a) = 1/n j x i (a) for a = 1,..., d x i c j Stop when the algorithm has converged i.e. none of the items changes cluster. 16 / 40

17 Converging to clusters 17 / 40

18 How do we select k? There are diminishing returns in the size of different clusters. An intuitive approach suggests picking the after which the distance flattens out. 18 / 40

19 Hierarchical 19 / 40

20 Agglomerative vs. divisive Agglomerative clustering starts from the bottom and moves to larger clusters. Divisive clustering starts with one cluster which is gradually disintegrated into smaller ones. 20 / 40

21 Agglomerative vs. divisive How do we determine the nearness of clusters? Complete linkage: D(X, Y ) = max x X,y Y d(x, y) Single linkage: D(X, Y ) = min x X,y Y d(x, y) Average linkage: 1 X Y x X y Y d(x, y). 21 / 40

22 Agglomerative Pick k upfront, stop when we have k clusters. Stop when a cluster with low cohesion is created (diameter, radius or density-based approaches). 22 / 40

23 Kohonen s self-organizing maps Step 0: Randomly position the grid s neurons in the data space. 23 / 40

24 Kohonen s self-organizing maps Step 1: Select one data point, either randomly or systematically cycling through the dataset in order 24 / 40

25 Kohonen s self-organizing maps Step 2: Find the neuron that is closest to the chosen data point. This neuron is called the Best Matching Unit (BMU). 25 / 40

26 Kohonen s self-organizing maps Step 3: Move the BMU closer to that data point. The distance moved by the BMU is determined by a learning rate, which decreases after each iteration. 26 / 40

27 Kohonen s self-organizing maps Step 4: Move the BMU s neighbors closer to that data point as well, with farther away neighbors moving less. Neighbors are identified using a radius around the BMU, and the value for this radius decreases after each iteration. 27 / 40

28 Kohonen s self-organizing maps 28 / 40

29 Kohonen s self-organizing maps 29 / 40

30 Kohonen s self-organizing maps 30 / 40

31 Kohonen s self-organizing maps 31 / 40

32 Kohonen s self-organizing maps Update the learning rate and BMU radius, before repeating Steps 1 to 4. Iterate these steps until positions of neurons have been stabilized. 32 / 40

33 Kohonen s self-organizing maps 33 / 40

34 Principal component 34 / 40

35 Principal component 35 / 40

36 Principal component 36 / 40

37 Principal component Often used to accelerate supervised learning. Visualization 37 / 40

38 Principal component 38 / 40

39 Principal component 39 / 40

40 in recommender systems 40 / 40

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using