CS224W: Analysis of Networks Jure Leskovec, Stanford University

Similar documents
Jure Leskovec Machine Learning Department Carnegie Mellon University

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang

Jure Leskovec Computer Science Department Cornell University / Stanford University

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

Submodular Optimization in Computational Sustainability. Andreas Krause

Part I Part II Part III Part IV Part V. Influence Maximization

Submodular Optimization

SFI Talk. Lev Reyzin Yahoo! Research (work done while at Yale University) talk based on 2 papers, both with Dana Angluin and James Aspnes

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Coverage Approximation Algorithms

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

A survey of submodular functions maximization. Yao Zhang 03/19/2015

Online Graph Exploration

Absorbing Random walks Coverage

Theorem 2.9: nearest addition algorithm

Absorbing Random walks Coverage

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Table of Contents. Course Minutiae. Course Overview Algorithm Design Strategies Algorithm Correctness Asymptotic Analysis 2 / 32

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Distributed Submodular Maximization in Massive Datasets. Alina Ene. Joint work with Rafael Barbosa, Huy L. Nguyen, Justin Ward

1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016

Storage Allocation Based on Client Preferences

Vertex Cover Approximations

Trees, Trees and More Trees

11.1 Facility Location

CMPSCI611: Approximating SET-COVER Lecture 21

Incremental Sensor Placement Optimization on Water Network

CSE 473: Artificial Intelligence

Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS 188: Artificial Intelligence Fall 2008

Announcements. CS 188: Artificial Intelligence Fall Large Scale: Problems with A* What is Search For? Example: N-Queens

Polynomial-Time Approximation Algorithms

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CSC 373: Algorithm Design and Analysis Lecture 3

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Non-deterministic Search techniques. Emma Hart

Adaptive Caching Algorithms with Optimality Guarantees for NDN Networks. Stratis Ioannidis and Edmund Yeh

Welfare Navigation Using Genetic Algorithm

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Intro to Algorithms. Professor Kevin Gold

Recap Hill Climbing Randomized Algorithms SLS for CSPs. Local Search. CPSC 322 Lecture 12. January 30, 2006 Textbook 3.8

Maximizing the Spread of Influence through a Social Network

Approximation Algorithms for Item Pricing

CS 188: Artificial Intelligence Spring Today

On the Max Coloring Problem

Heuristic Search. Heuristic Search. Heuristic Search. CSE 3401: Intro to AI & LP Informed Search

Algorithmic Problems in Epidemiology

Algorithm 23 works. Instead of a spanning tree, one can use routing.

Algorithms for Storage Allocation Based on Client Preferences

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Influence Maximization in the Independent Cascade Model

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

BAYESIAN NETWORKS STRUCTURE LEARNING

Announcements. CS 188: Artificial Intelligence Fall Reminder: CSPs. Today. Example: 3-SAT. Example: Boolean Satisfiability.

CS 188: Artificial Intelligence Fall 2008

Hierarchical Clustering 4/5/17

CS 188: Artificial Intelligence Fall 2011

1 Counting triangles and cliques

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

1 Motivation for Improving Matrix Multiplication

Recap Randomized Algorithms Comparing SLS Algorithms. Local Search. CPSC 322 CSPs 5. Textbook 4.8. Local Search CPSC 322 CSPs 5, Slide 1

Informed Search. CS 486/686 University of Waterloo May 10. cs486/686 Lecture Slides 2005 (c) K. Larson and P. Poupart

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem

Greedy Approximations

Today s s lecture. Lecture 3: Search - 2. Problem Solving by Search. Agent vs. Conventional AI View. Victor R. Lesser. CMPSCI 683 Fall 2004

Today. CS 188: Artificial Intelligence Fall Example: Boolean Satisfiability. Reminder: CSPs. Example: 3-SAT. CSPs: Queries.

A List Heuristic for Vertex Cover

Unit 8: Coping with NP-Completeness. Complexity classes Reducibility and NP-completeness proofs Coping with NP-complete problems. Y.-W.

Centrality in networks

10601 Machine Learning. Model and feature selection

Approximation Algorithms

CSE 202: Design and Analysis of Algorithms Lecture 4

Scan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CSC384: Intro to Artificial Intelligence Search III

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 24: Online Algorithms

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Announcements. CSEP 521 Applied Algorithms. Announcements. Polynomial time efficiency. Definitions of efficiency 1/14/2013

Name: UW CSE 473 Midterm, Fall 2014

Constraint Satisfaction Problems (CSPs)

Big Data Analytics CSCI 4030

CSE 421 Applications of DFS(?) Topological sort

A Class of Submodular Functions for Document Summarization

Module 4. Constraint satisfaction problems. Version 2 CSE IIT, Kharagpur

Approximation Algorithms for Wavelength Assignment

Fill er Up: Winners and Losers for Filling an Ordered Container

Week 12: Running Time and Performance

CSE 521: Design and Analysis of Algorithms I

On-Line Social Systems with Long-Range Goals

ECS 289 / MAE 298, Lecture 15 Mar 2, Diffusion, Cascades and Influence, Part II

The k-center problem Approximation Algorithms 2009 Petros Potikas

DFS. Depth-limited Search

Transcription:

HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 (1) New problem: Outbreak detection (2) Develop an approximation algorithm It is a submodular opt. problem! (3) Speed-up greedy hill-climbing Valid for optimizing general submodular functions (i.e., also works for influence maximization) (4) Prove a new data dependent bound on the solution quality Valid for optimizing any submodular function (i.e., also works for influence maximization)

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3 Given a real city water distribution network And data on how contaminants spread in the network Detect the contaminant as quickly as possible Problem posed by the US Environmental Protection Agency S

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4 Posts Blogs Information cascade Time ordered hyperlinks Which blogs should one read to detect cascades as effectively as possible?

Want to read things before others do. Detect blue & yellow stories soon but miss the red story. Detect all stories but late. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6 Both of these two are an instance of the same underlying problem! Given a dynamic process spreading over a network we want to select a set of nodes to detect the process effectively Many other applications: Epidemics Influence propagation Network security

Utility of placing sensors: Water flow dynamics, demands of households, For each subset S Í V compute utility f(s) High impact outbreak Contamination S3 S1S2 Low impact outbreak Medium impact outbreak S3 S1 S4 Set V of all network junctions Sensor reduces impact through early detection! S1 S4 S2 High sensing quality (e.g., f(s) = 0.9) Low sensing quality (e.g. f(s)=0.01) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

Given: Graph G(V, E) Data on how outbreaks spread over the G: For each outbreak i we know the time T(u, i) when outbreak i contaminates node u Water distribution network (physical pipes and junctions) Simulator of water consumption&flow (built by Mech. Eng. people) We simulate the contamination spread for every possible location. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

Given: Graph G(V, E) Data on how outbreaks spread over the G: For each outbreak i we know the time T(u, i) when outbreak i contaminates node u a b c b a c The network of the blogosphere Traces of the information flow and identify influence sets Collect lots of blogs posts and trace hyperlinks to obtain data about information flow from a given blog. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

Given: Graph G(V, E) Data on how outbreaks spread over the G: For each outbreak i we know the time T(u, i) when outbreak i contaminates node u Goal: Select a subset of nodes S that maximizes the expected reward: max / 1 f S = 5 P i f 7 S subject to: cost(s) < B 7 Expected reward for detecting outbreak i P(i) probability of outbreak i occurring. f(i) reward for detecting outbreak i using sensors S. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

Reward (one of the following three): (1) Minimize time to detection (2) Maximize number of detected propagations (3) Minimize number of infected people Cost (context dependent): Reading big blogs is more time consuming Placing a sensor in a remote location is expensive outbreak i Monitoring blue node saves more people than monitoring the green node 1 2 3 5 7 6 9 11 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8 10 f(s) 11

Objective functions: 1) Time to detection (DT) How long does it take to detect a contamination? Penalty for detecting at time t: π 7 (t) = t 2) Detection likelihood (DL) How many contaminations do we detect? Penalty for detecting at time t: π 7 (t) = 0, π 7 ( ) = 1 Note, this is binary outcome: we either detect or not 3) Population affected (PA) How many people drank contaminated water? Penalty for detecting at time t: π 7 (t) = {# of infected nodes in outbreak i by time t}. Observation: In all cases detecting sooner does not hurt! 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

We define f i S as penalty reduction: f 7 S = π 7 π 7 (T(S, i)) Observation: Diminishing returns S 1 New sensor: S s S 1 S 3 S 2 S 2 S 4 Placement S={s 1, s 2 } Placement S ={s 1, s 2, s 3, s 4 } Adding s helps a lot Adding s helps very little 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14 Claim: For all A B V and sensors s V\B f A s f A f B s f B Proof: All our objectives are submodular Fix cascade/outbreak i Show f i A = π i π i (T(A, i)) is submodular Consider A B V and sensor s V\B When does node s detect cascade i? We analyze 3 cases based on when s detects outbreak i (1) T s, i T(A, i): s detects late, nobody benefits: f 7 A s = f 7 A, also f 7 B s = f 7 B and so f 7 A s f 7 A = 0 = f 7 B s f 7 B

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15 Proof (contd.): Remember A B (2) T B, i T s, i < T A, i : s detects after B but before A s detects sooner than any node in A but after all in B. So s only helps improve the solution A (but not B) f 7 A s f 7 A 0 = f 7 B s f 7 B (3) T s, i < T(B, i): s detects early f 7 A s f 7 A = π 7 π 7 T s, i f 7 (A) π 7 π 7 T s, i f 7 (B) = f 7 B s f 7 B Ineqaulity is due to non-decreasingness of f 7 ( ), i.e., f 7 A f 7 (B) So, f i ( ) is submodular! So, f( ) is also submodular f S = 5 P i f 7 S 7

a b c d e Hill-climbing reward b Add sensor with highest marginal gain 10/26/17 c d a e What do we know about optimizing submodular functions? A hill-climbing (i.e., greedy) is near optimal: (1 1 e ) OPT But: (1) This only works for unit cost case! (each sensor costs the same) For us each sensor s has cost c(s) (2) Hill-climbing algorithm is slow At each iteration we need to re-evaluate marginal gains of all nodes Runtime O( V K) for placing K sensors Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Part 2-16

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18 Consider the following algorithm to solve the outbreak detection problem: Hill-climbing that ignores cost Ignore sensor cost c(s) Repeatedly select sensor with highest marginal gain Do this until the budget is exhausted Q: How well does this work? A: It can fail arbitrarily badly! L Next we come up with an example where Hillclimbing solution is arbitrarily away from OPT

Bad example when we ignore cost: n sensors, budget B s 1 : reward r, cost B, s 2 s n : reward r ε, All sensors have the same cost: c s i = 1 Hill-climbing always prefers more expensive sensor s 1 with reward r (and exhausts the budget). It never selects cheaper sensors with reward r ε For variable cost it can fail arbitrarily badly! Idea: What if we optimize benefit-cost ratio? s 7 = arg max d 1 f A 7ef {s} f(a 7ef ) c s Greedily pick sensor s i that maximizes benefit to cost ratio. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

Benefit-cost ratio can also fail arbitrarily badly! Consider: budget B: 2 sensors s 1 and s 2 : Costs: c(s 1 ) = ε, c(s 2 ) = B Only 1 cascade: f(s 1 ) = 2ε, f(s 2 ) = B Then benefit-cost ratio is: B/c(s 1 ) = 2 and B/c(s 2 ) = 1 So, we first select s 1 and then can not afford s 2 We get reward 2ε instead of B! Now send ε 0 and we get arbitrarily bad solution! This algorithm incentivizes choosing nodes with very low cost, even when slightly more expensive ones can lead to much better global results. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

CELF (Cost-Effective Lazy Forward-selection) A two pass greedy algorithm: Set (solution) S : Use benefit-cost greedy Set (solution) S : Use unit-cost greedy Final solution: S = arg max (f(s ), f(s )) How far is CELF from (unknown) optimal solution? Theorem: CELF is near optimal [Krause&Guestrin, 05] CELF achieves ½(1-1/e) factor approximation! This is surprising: We have two clearly suboptimal solutions, but taking best of the two is guaranteed to give a near-optimal solution. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23 a b c d e Hill-climbing reward b Add sensor with highest marginal gain c d a e What do we know about optimizing submodular functions? A hill-climbing (i.e., greedy) is near optimal (that is, (1 f r ) OPT) But: (2) Hill-climbing algorithm is slow! At each iteration we need to reevaluate marginal gains of all nodes Runtime O( V K) for placing K sensors

In round i + 1: So far we picked S i = {s 1,, s i } Now pick s iw1 = arg max f(s i {u}) f(s i ) u This our old friend greedy hill-climbing algorithm. It maximizes the marginal gain δ i u = f(s i {u}) f(s i ) By submodularity property: f S 7 u f S 7 f S u f S for i < j Observation: By submodularity: For every u δ 7 (u) δ (u) for i < j since S i Sj d i (u) Marginal benefits d i (u) only shrink! (as i grows) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24 u ³ d j (u) Activating node u in step i helps more than activating it at step j (j>i)

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25 Idea: Use d i as upper-bound on d j (j > i) Lazy hill-climbing: Keep an ordered list of marginal benefits d i from previous iteration Re-evaluate d i only for top node Re-sort and prune Marginal gain a b c d e S 1 ={a} f(s È {u}) f(s) f(t È {u}) f(t) S Í T

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26 Idea: Use d i as upper-bound on d j (j > i) Lazy hill-climbing: Keep an ordered list of marginal benefits d i from previous iteration Re-evaluate d i only for top node Re-sort and prune Marginal gain a b c d e S 1 ={a} f(s È {u}) f(s) f(t È {u}) f(t) S Í T

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27 Idea: Use d i as upper-bound on d j (j > i) Lazy hill-climbing: Keep an ordered list of marginal benefits d i from previous iteration Re-evaluate d i only for top node Re-sort and prune Marginal gain a d b e c S 1 ={a} S 2 ={a,b} f(s È {u}) f(s) f(t È {u}) f(t) S Í T

CELF (using Lazy evaluation) runs 700 times faster than greedy hillclimbing algorithm 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30 Back to the solution quality! The (1-1/e) bound for submodular functions is the worst case bound (worst over all possible inputs) Data dependent bound: Value of the bound depends on the input data On easy data, hill climbing may do better than 63% Can we say something about the solution quality when we know the input data?

Suppose S is some solution to f(s) s.t. S k f(s) is monotone & submodular Let OPT = {t 1,, t k } be the OPT solution For each u let δ u = f S u f S Order δ u so that δ 1 δ 2 k i 1 Then: f OPT f S + δ i Note: This is a data dependent bound (δ i depends on input data) Bound holds for any algorithm Makes no assumption about how S was computed For some inputs it can be very loose (worse than 63%) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32 Claim: For each u let δ(u) = f(s {u}) f(s) Order δ u so that δ 1 δ 2 k Then: f OPT f S + i 1 δ(i) Proof: f OPT f OPT S = f S + f OPT S f(s) f S + f S t 7 f S 7 f 7 f = f S + δ(t 7 ) Instead of taking t i ÎOPT (of benefit δ(t 7 )), we take the best possible element (δ(i)) f S + 7 f δ(i) f T f S + i 1 δ(i) k (we proved this last time)

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34 Real metropolitan area water network V = 21,000 nodes E = 25,000 pipes Use a cluster of 50 machines for a month Simulate 3.6 million epidemic scenarios (random locations, random days, random time of the day)

Solution quality F(A) Higher is better 1.4 1.2 1 0.8 0.6 0.4 Offline the (1-1/e) bound Data-dependent bound Hill Climbing 0.2 0 0 5 10 15 20 Number of sensors placed Data-dependent bound is much tighter (gives more accurate estimate of alg. performance) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35

[w/ Ostfeld et al., J. of Water Resource Planning] Author Score Placement heuristics perform much worse CELF 26 Sandia 21 U Exter 20 Bentley systems 19 Technion (1) 14 Bordeaux 12 U Cyprus 11 U Guelph 7 U Michigan 4 Michigan Tech U 3 Malcolm 2 Proteo 2 Technion (2) 1 Battle of Water Sensor Networks competition 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37 Different objective functions give different sensor placements Population affected Detection likelihood

Here CELF is many times faster than greedy hill-climbing! (But there might be datasets/inputs where the CELF will have the same running time as greedy hill-climbing) 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 38

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39 = I have 10 minutes. Which blogs should I read to be most up to date?? = Who are the most influential bloggers?

Want to read things before others do. Detect blue & yellow soon but miss red. Detect all stories but late. 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41 Crawled 45,000 blogs for 1 year Obtained 10 million posts And identified 350,000 cascades Cost of a blog is the number of posts it has

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42 Online bound turns out to be much tighter! Based on the plot below: 87% instead of 32.5% Old bound vs. Our bound CELF

Heuristics perform much worse! One really needs to perform the optimization 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44 CELF has 2 sub-algorithms. Which wins? Unit cost: CELF picks large popular blogs Cost-benefit: Cost proportional to the number of posts We can do much better when considering costs

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45 Problem: Then CELF picks lots of small blogs that participate in few cascades We pick best solution that interpolates between the costs f(s)=0.3 Score f(s)=0.4 We can get good solutions with few blogs and few posts f(s)=0.2 Each curve represents a set of solutions S with the same final reward f(s)

10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Part 2-46 We want to generalize well to future (unknown) cascades Limiting selection to bigger blogs improves generalization!

[Leskovec et al., KDD 07] 10/26/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47 CELF runs 700 times faster than simple hillclimbing algorithm