arxiv: v1 [cs.ds] 12 Jun 2017

Size: px

Start display at page:

Download "arxiv: v1 [cs.ds] 12 Jun 2017"

Eugene Reynolds
6 years ago
Views:

1 Streaming Non-monotone Submodular Maximization: Personalized Video Summarization on the Fly arxiv: v [cs.ds] 2 Jun 27 Baharan Mirzasoleiman ETH Zurich Stefanie Jegelka MIT Abstract Andreas Krause ETH Zurich The need for real time analysis of rapidly producing data streams (e.g., video and image streams) motivated the design of streaming algorithms that can efficiently extract and summarize useful information from massive data on the fly. Such problems can often be reduced to maximizing a submodular set function subject to various constraints. While efficient streaming methods have been recently developed for monotone submodular maximization, in a wide range of applications, such as video summarization, the underlying utility function is nonmonotone, and there are often various constraints imposed on the optimization problem to consider privacy or personalization. We develop the first efficient single pass streaming algorithm, STREAMING LOCAL SEARCH, with constant factor (4p )/4p(8p+2d ) approximation guarantee for maximizing a non-monotone submodular function under a p-system and d knapsack constraints, with a memory independent of data size. In our experiments, we show that for the video summarization problem, our streaming method, while achieving practically the same performance, runs more than 7 times faster than previous work. Introduction Data summarization the task of efficiently extracting a representative subset of manageable size from a large dataset has become an important goal in machine learning and information retrieval. Submodular maximization has recently been explored as a natural abstraction for many data summarization tasks, including image summarization [], scene summarization [2], document and corpus summarization [3], active set selection in non-parametric learning [4] and training data compression [5]. Submodularity is an intuitive notion of diminishing returns, stating that selecting any given element earlier helps more than selecting it later. Given a set of constraints on the desired summary, and a (pre-designed or learned) submodular utility function f that quantifies the representativeness f(s) of a subset S of items, data summarization can be naturally reduced to a constrained submodular optimization problem. In this paper, we are motivated by application of non-monotone submodular maximization. In particular, we consider video summarization in a streaming setting, where video frames are produced at a fast pace, and we want to keep an updated summary of the video so far, with little or no memory overhead. This has important applications e.g. in surveillance cameras, wearable cameras, and astro video cameras, where massive volume of rapidly produced data makes it impractical for computational units to analyze and store them in main memory. The same framework can be applied more generally in many settings where we need to extract a small subset of data from a large stream to train or update a machine learning model. At the same time, various constraints may be imposed by the underlying summarization application. These may range from a simple limit on the size of the summary to more complex restrictions such as focusing on particular individuals or objects, or excluding them from the summary. These requirements often arise in real-world scenarios to consider privacy concerns (e.g. in case of surveillance cameras) or personalization (according to users interests).

2 In machine learning, Determinantal Point Processes (DPP) have been proposed as computationally efficient methods for selecting a diverse subset from a ground set of items [6]. They have recently shown great success for video summarization [7], as well as problems like document summarization [6] and information retrieval [8]. While finding the most likely configuration (MAP) is NP-hard, the DPP probability is a log-submodular function, and submodular optimization techniques can be used to find a near optimal solution. In general the above submodular function is very non-monotone, and we need techniques for maximizing a non-monotone submodular function in the streaming setting. Although efficient streaming methods have been recently developed for maximizing a monotone submodular function f with a variety of constraints, there is no effective solution for non-monotone submodular maximization under general types of constraints in the streaming setting. In this work, we provide STREAMING LOCAL SEARCH, the first single pass streaming algorithm for non-monotone submodular function maximization, subject to the intersection of a p-system and d knapsack constraints. Our approach builds on local search, a widely used technique for maximizing non-monotone submodular functions in batch mode. Local search, however, needs multiple passes over the input, and hence does not directly extend to the streaming setting, where we are only allowed to make a single pass over the data. STREAMING LOCAL SEARCH provides a constant factor(4p )/4p(8p+2d ) approximation to the optimum solution, using O(pklog 2 (k)/ε 2 ) memory and O(pk 2 log 2 (k)/ε 2 ) update time per element, where k is the size of the largest feasible solutions. Using parallel computation, the update time can be reduced to O(pk 2 ), making our approach an appealing solution in real-time scenarios. We show that for video summarization, our algorithm leads to streaming solutions that provide competitive utility when compared with those obtained via centralized methods, at a small fraction of the computational cost, i.e. more than 7 times faster. 2 Related Work Video summarization aims to retain diverse and representative frames according to criteria such as representativeness, diversity, interestingness, or importance of the frames [9,, ]. This often requires hand-crafting to combine the criteria effectively. Recently, [7] proposed a supervised subset selection method using DPPs. Despite its superior performance, this method uses an exhaustive search for MAP inference, which makes it inapplicable for producing real-time summaries. Local search has been widely used for submodular maximization subject to various constraints. This includes the analysis of greedy and local search by Nemhauser et al. [2] providing a /(p+) approximation for monotone submodular maximization under p matroid constraints. Among the most recent results for non-monotone submodular maximization are a (+O(/ p))p-approximations subject to p-independence system constraints [3], a /(5 ε) approximation under d knapsack constraints [4], and a(p + )(2p + 2l + )/p-approximation, for maximizing a general submodular function subject to ap-system anddknapsack constraints [5]. Streaming algorithms for submodular maximization have gained increasing attention for producing online summaries from data streams. Recently, Badanidiyuru et al. [6] proposed a single pass streaming algorithm for monotone maximization that yields a /2 ǫ approximation and needs O(k log k/ǫ) memory. Chakrabarti and Kale [7] developed a single pass algorithm for monotone functions over intersections of p matroids, achieving a /4p approximation guarantee. However, the required memory increases polylogarithmically with the size of the data. Finally, Chekuri et al. [8] presented a deterministic and a randomized algorithm for maximizing monotone and non-monotone submodular functions subject to a broader range of constraints, namely p-matchoids. Their methods gives a Ω(/p) approximation (in expectation) using O(klogk/ǫ 2 ) memory (k is the size of the largest feasible solution). 3 Problem Statement We consider the problem of summarizing a stream of data by selecting, on the fly, a subset that maximizes a utility function f : 2 V R +. The utility function is defined on subsets of the entire streamv and, for eachs V,f(S) quantifies how well S representsv. We assume that f is submodular, a property that holds for many widely used such utility functions. This means that for any two sets S T V and any elemente V \T we have that f(s {e}) f(s) f(t {e}) f(t). 2

3 We denote the marginal gain of adding an element e V to a summary S V by f S (e) = f(s {e}) f(s). The functionf is monotone if f S (e) for all S V and e V \S. Here, we allow f to be non-monotone. Many data summarization applications can be cast as an instance of a constrained submodular maximization under a set ζ 2 V of constraints: S = argmax S ζ f(s), In this work, we consider a wide set of hereditary constraints, where any subset of a feasible set is also feasible, and knapsack constraints. Common examples of hereditary constraints are matroids, matchoids and p-systems. A matroid M is a pair (V,I) where V is a finite (ground) set, and I 2 V is a family of independent subsets of V satisfying the following two properties. (i) for any A B V ; B I implies that A I (heredity property), and (ii) if A,B I and B > A, there is an element e B \ A such that A {e} I. The independent sets of M share a common cardinality, called the rank ofm. A uniform matroid is the family of all subsets of size at mostk. In a partition matroid, we have a collection of disjoint setsb i and integers k i B i where a seta is independent if for every indexi, we have A B i k i. Ap-matchoid generalizes matchings and intersections of matroids. For q matroids defined over overlapping groundsets,m l (V l,i l ),l [q], it requires that every elemente V, is a member of V l for at most p indices. Finally, a p-system is the most general type of constraint we consider in this paper. It requires that if A,B I are two maximal sets, then A p B. A knapsack constraint is defined by a cost functionc : V R +. A set S V is said to satisfy the knapsack constraint ifc(s) = e S c(e). The goal in this paper is to maximize a (non-monotone) submodular function f subject to a set of constraints ζ defined by the intersection of a p-independence system (V, I) and d knapsacks. In other words, we would like to find a sets I that maximizesf where for each knapsackc i,i [d], we have e S c i(e). We assume that the ground set V = {e,,e n } is received from the stream in some arbitrary order. At each point t in time, the algorithm may maintain a memory M t V of points, and must be ready to output a candidate feasible solution S t M t, such that S t ζ. Upon receiving an elemente t from the stream, the algorithm may elect to ) insert it into its memory, 2) discard some elements in it s memory and accepte t instead, 3) discarde t. 4 Video Summarization with DPPs Suppose that we are receiving a stream of video frames, e.g. from a surveillance or a wearable camera, and we wish to select a subset of frames that concisely represents all the diversity from the video. DPP is an appealing tool for modeling diversity in such applications. DPPs [9] are distributions over subsets with a preference for diversity, and have been successfully applied to video summarization [7], as well as problems like document summarization [6] and information retrieval [8]. Formally, a DPP P on a set of items V = {,2,...,N} defines a discrete probability distribution on2 V (the set of all subsets ofv), such that the probability of observing subset S V is P(Y = S) = det(l S) det(i+l), () where L is a positive semidefinite kernel matrix, and L S [L ij ] i,j S, is the restriction of L to the entries indexed by elements of S, and I is the N N identity matrix. In order to find the most diverse and informative feasible subset, we need to solve the NP-hard problem of finding argmax S I det(l S ) [2], where I 2 V is a given family of feasible solutions. However, the logarithm f(s) = logdet(l S ) is a (non-monotone) submodular function [6], and we can apply submodular maximization techniques. Various constraints can be imposed while maximizing the above non-monotone submodular utility function. In its simplest form, we can partition the video to T segments, and define a diversity reinforcing partition matroid to select at most k frames from each segment. In another example, various content based constraints can be applied, e.g., we can use object recognition to select at most k i frames from personiin the video, or to find a summary that is focused on a particular person or object. Finally, to improve the quality of the produced summaries, the cost of a frame can be chosen as a function of its quality, such as resolution, contrast, luminance, or the probability that the given frame contains an object. 3

4 Algorithm STREAMING LOCAL SEARCH for Independence Systems Input: f : 2 E R +, a membership oracle for independence-systemsi 2 E, and a streaming algorithm INDSTREAM for independence systems with α-approximation guarantee Output: A set S E satisfyings I. : fort = tot do 2: D {e t } 3: for i = to do LOCAL SEARCH iterations 4: [D i,s i ]= INDSTREAM i (D i ) D i is the discarded set by INDSTREAM i 5: 6: S i =UNCONSTRAINED-MAX(S i). end for 7: S t = argmax i {S i,s i } 8: end for 9: ReturnS t 5 Streaming algorithm for constrained submodular maximization In this section, we describe our streaming algorithm for maximizing a non-monotone submodular function subject to the intersection of a p-system and d-knapsack constraints. Our approach builds on local search, which is a powerful and widely used technique for maximizing non-monotone submodular functions. It starts from a candidate solution S and iteratively increases the value of the solution by either including a new element in S or discarding one of the elements of S [2]. Gupta et al. [22] showed that similar results can be obtained with much lower complexity by using algorithms for monotone submodular maximization, which, however, are run multiple times. Despite their effectiveness, these algorithms need multiple passes over the input and do not directly extend to the streaming setting, where we are only allowed to make a single pass over the data. In the sequel, we show how local search can be implemented in a single pass in the streaming setting. 5. STREAMING LOCAL SEARCH for independence systems The simple yet crucial observation underlying the approach of Gupta et el. [22] is the following. The solution obtained by approximation algorithms for monotone submodular functions often satisfy f(s) αf(s C ), where α >, and C is the optimal solution. In the monotone case f(s C ) f(c ), and we get the desired result. However, this does not hold for non-monotone functions. But, if f(s C ) provides a good fraction of the optimal solution, then we can find a near-optimal solution by pruning elements in S using unconstrained maximization. This still retains a feasible set, since the constraints are downward closed. Otherwise, if f(s C ) εopt, then running another round of the algorithm on the remainder of the ground set will lead to a good solution. Backed by the above intuition, we will try to build multiple disjoint solutions simultaneously within a single pass over the data. Let INDSTREAM be a single pass streaming algorithm for monotone submodular maximization under independence systems, with approximation factor α. Upon receiving a new element from the stream, INDSTREAM can choose () to insert it into its memory, (2) to replace it with one or a subset of elements in the memory, or otherwise (3) the element gets discarded and cannot be used later by the algorithm. The key insight for our approach is that it is possible to build other solutions from the elements discarded by INDSTREAM. Consider a chain of q = instances of our streaming algorithm, i.e. {INDSTREAM,, INDSTREAM q }. Any element e received from the stream is first passed to INDSTREAM. If INDSTREAM discards e, or adds e to its solution and instead discards a set of elements from its memory, then we pass the set D of discarded elements on to be processed by INDSTREAM 2. Similarly, if a set of elementsd 2 is discarded by INDSTREAM 2, we pass them to INDSTREAM 3, and so on. The elements discarded by the last instance INDSTREAM m are discarded forever. Theorem 5.. Let INDSTREAM be a distributed algorithm for monotone submodular maximization under a p-system constraint with approximation guarantee α. Algorithm returns a set S I with f(s) α 2/α OPT. We make Theorem 5. concrete by an example: Chekuri et al [8] proposed a /4p-approximation algorithm for maximizing a monotone submodular function under a p-matchoid constraint in the 4

5 Algorithm 2 STREAMING LOCAL SEARCH for Independence systems and d-knapsacks Input: f : 2 E R +, a membership oracle for independence-systems I 2 E, d knapsack-cost functionsc i : E [,], an upper bound on the cardinality of the largest feasible solutionk. Output: A set S E satisfyings I andc i (S) i. : fort = tot do 2: D {e t } 3: 4: m = max(m,f(e t )),e m = argmax t (f(e t )), γ = 2( α) m 2/α+2d R t = { γ,(+ǫ)γ,(+ǫ) 2 γ,(+ǫ) 3 γ,...,γ k } 5: forρ R t in parallel do 6: fori = to do LOCAL SEARCH f Si (e) 7: [D i,s i ]= INDSTREAMDENSITY i (D i,ρ), picks elements only if d ρ cie 8: 9: S i =UNCONSTRAINED-MAX(S i). end for : S ρ = argmax i {S i,s i,{e m}} : end for 2: S t = argmax ρ R f(s ρ ) 3: end for 4: ReturnS t streaming setting. Using this algorithm as INDSTREAM in our STREAMING LOCAL SEARCH, we obtain the following result: Corollary 5.2. With STREAMING GREEDY of [8] as INDSTREAM, STREAMING LOCAL SEARCH yields a solution S I with approximation guarantee (4p )/4p(8p ), using O(pklog 2 (k)/ε 2 ) memory and O(pk 2 log 2 (k)/ε 2 ) update time per element, where I are the independent sets of thep-matchoid constraint, andk is the size of the largest feasible solutions. 5.2 STREAMING LOCAL SEARCH for independence systems and d-knapsacks To respect multiple knapsack constraints in addition to the p-system, we integrate the idea of a density threshold [23, 24] into our local search algorithm. We use a (fixed) density threshold ρ to restrict the INDSTREAM algorithm to only pick elements if the function value per unit size of the selected elements is above the given threshold. We call this new algorithm INDSTREAMDENSITY. The threshold should be carefully chosen to be below the value/size ratio of the optimal solution. To do so, we need to know (a good approximation) to the value of the optimal solution OPT. To obtain a rough estimate of OPT, it suffices to know the maximum value m = max e V f(e) of any singleton element: submodularity implies that m OPT km, where k is an upper bound on the cardinality of the largest feasible solution satisfying all constraints. We update the value of the maximum singleton element on the fly [6], and lazily instantiate the thresholds to log(k)/ǫ different possible values. The idea of density threshold [23, 24] can be integrated into the above local search algorithm to provide guarantees for non-monotone submodular maximization under multiple knapsack constraints. Here a fixed density threshold ρ could be applied to restrict the INDSTREAM algorithm to pick elements, if the function value per unit size of the selected elements is above the given threshold. The threshold should be carefully chosen to be below the value/size ratio of the optimal solution. Hence, we requires that we know (a good approximation) to the value of the optimal solution OPT. In order to get a crude estimate on OPT, it is enough to know the maximum value of any singleton element m = max e V f(e). From submodularity, we have that m OPT km, where k is an upper bound on the cardinality of the largest feasible solution, under intersection of all the constraints. We update the value of the maximum singleton element on the fly [6], and instantiate log(k)/ǫ different threshold values, for which we run the local search in parallel. We show that for at least one of the discretized density thresholds we obtain a good enough solution. Theorem 5.3. STREAMING LOCAL SEARCH (outlined in Alg. 2) has an approximation guarantee f(s) ( α)( ǫ) (2/α+2d ) OPT, 5

6 Table : Performance of various video summarization methods on YouTube and OVP. YouTube OVP seqdpp FANTOM STREAMING LS Linear N. Nets Linear N. Nets Linear N. Nets F 57.8±.5 6.3± ±.5 6.3± ± ±.5 P 54.2± ±.6 54.±.5 59.± ± ±.6 R 69.8± ±.5 7.± ±.5 7.± ±.5 F 75.5± ± ±.3 78.± ± ±.5 P 77.5±.5 75.± ±.3 75.± ±.2 7.8±.7 R 78.4± ± ± ± ± ±.2 with update timeo(t log(k)/(αǫ)) per element, wherek is an upper bound on the size of the largest feasible solution, and T is the update time of INDSTREAM algorithm. Corollary 5.4. By using STREAMING GREEDY of [8], we get that STREAMING LOCAL SEARCH has an approximation ratio(+ǫ)(4p)(8p+2d )/(4p ) witho(pklog 2 (k)/ε 2 ) memory and update timeo(pk 2 log 2 (k)/ǫ 2 ) per element. 6 Experiments In this section, we apply STREAMING LOCAL SEARCH to video summarization in streaming setting. The main goal of this section is to validate our theoretical results and demonstrate the effectiveness of STREAMING LOCAL SEARCH in practical scenarios where existing algorithms are incapable of providing desirable solutions. We compare the performance of our streaming algorithm with that of [7], and the centralized method, FANTOM, for maximizing non-monotone submodular functions under ap-system andd-kanpsack constraints [5]. Dataset. For our experiment, we use the Open Video Project (OVP), and the Youtube dataset with 5 and 39 videos, respectively [25]. We use the pruned video frames as described in [7], where one frame is uniformly sampled per second, and uninformative frames are removed. Each video frame is then associated with a feature vector that consists of Fisher vectors [26] computed from SIFT features [27], contextual features, and features computed from the frame saliency map [28]. The size of the feature vectors,v i, are 86 and 58 for OVP and YouTube dataset respectively. The DPP kernel L (c.f. Section 4), can be parametrized and learned via maximum likelihood estimation [7]. To compare the performance of our algorithm with the method of [7], we use both linear transformation, i.e. L ij = vi TWT i W iv i, as well as non-linear transformation using a one-hiddenlayer neural network, i.e. L ij = zi TWT Wz j where z i = tanh(uv i ), and tanh(.) stands for the hyperbolic transfer function. ParametersW oru andw, are learned on 8% of the videos, selected uniformly at random. Following [7] for evaluation, we treat each of the 5 human-created summaries per video as ground truth for each video. Sequential DPP. To capture the sequential structures in video data, [7] proposed sequential DPP. Here, a long video sequence is partitioned into T disjoint yet consecutive short segments, and at time t {,, T}, a DPP is imposed over two neighboring segments. The conditional distribution of the selected subset from segmentt is thus given byp(s t S t ) = det(ks t S t ) det(i t+k St V t ), wherev t is all the video frames in segment t, and I t is a diagonal matrix in which the elements corresponding to S t are zeros and the elements corresponding tos t are. Intuitively, the sequential DPP only captures the diversity between the frames in segmentt, and the selected subsets t from the immediate past segment t. MAP inference for the sequential DPP is as hard as for the standard DPP, but submodular optimization techniques can be used to find approximate solutions. In our experiments, we use sequential DPP as the utility function in all the algorithms. Results. Figures a, g show the ratio of the F-score obtained by STREAMING LOCAL SEARCH and FANTOM vs. the F-score obtained by the method of [7] for varying segment size, using linear embeddings on YouTube and OVP datasets. It can be observed that our streaming method is able to obtain the same quality of solution compared to the centralized baselines. Figures a, g show the speedup of STREAMING LOCAL SEARCH and FANTOM over the method of [7], for varying segment size. We note that both FANTOM and STREAMING LOCAL SEARCH show an exponential speedup by increasing the segment size. Interestingly, STREAMING LOCAL SEARCH is able to 6

7 Normalized F-score Speedup Stream LS Fantom Utility Running time (a) YouTube Linear (b) YouTube Linear Fantom Streaming Local Search Random (c) YouTube Linear.2 2 Streaming Local Search Fantom Utility Running time Normalized F-score Speedup (d) YouTube N. Nets (e) YouTube N. Nets Fantom Streaming Local Search Random (f) YouTube N. Nets.2 2 Stream LS Fantom Utility Running time Normalized F-score Speedup (g) OVP Linear (h) OVP Linear Fantom Streaming Local Search Random (i) OVP Linear.2 2 Stream LS Fantom Utility Running time Normalized F-score Speedup (j) OVP N. Nets (k) OVP N. Nets Fantom Streaming Local Search Random (l) OVP N. Nets Figure : Performance of STREAMING LOCAL SEARCH compare to the other benchmarks. a), d) show the ratio of the F-score obtained by STREAMING LOCAL SEARCH and FANTOM vs. the F-score obtained by the method of [7], using linear embeddings on YouTube and OVP datasets. g), j) show similar qualities using nonlinear features from a one-hidden-layer neural network. b), e), h), k) show the speedup of STREAMING LOCAL SEARCH and FANTOM over the method of [7]. c), f), i), l) show the utility vs running time for STREAMING LOCAL SEARCH vs FANTOM and random selection. obtain a similar quality of summary, but more than 7 times faster than [7], and more than 2 times faster than FANTOM for larger segment size. This makes our streaming method an appealing solution for extracting real-time summaries. Note that in real-world scenarios, video frames are received in a fast pace, and thus we need to use a larger segment size in practice. Moreover, unlike the centralized baselines that need to first buffer an entire segment, and then produced summaries, our method generates real-time summaries after receiving each video frame. This capability is of significant importance in privacy sensitive applications. Figures d, j show similar behaviour for using nonlinear hidden representation, where a one-hiddenlayer neural network is used to infer a hidden representation for each frame. It can be seen that while using non-linear representations generally improves the quality of the solution, a similar exponential speedup is achieved by our streaming method (Fig. e, k). We also compared the ratio of the utility 7

Figure 2: Summary focused on judges, and

Figure 3: Summary produced by method of [7]

STREAMING LOCAL SEARCH (middle row), and a

Finally, we compared the performance of our

F-score, Precision, and Recall in Table, for

We see that STREAMING LOCAL SEARCH is able

exhaustive search [7], and the centralized

Using constraints to generate customized

customized We apply STREAMING LOCAL SEARCH

2 shows a summary focused on the judges.

constraints to limit the number of frames

The limits for all the matroid constraints

8 Figure 2: Summary focused on judges, and singer for YouTube video 6. Figure 3: Summary produced by method of [7] (top row), vs. STREAMING LOCAL SEARCH (middle row), and a user selected summary (bottom row), for YouTube video 5. and running time of our algorithm to FANTOM using original DPP (c.f. Section 4) for producing summaries of length 5% of the video length. Again, our method achieved a competitive performance with much less running time (Figures i, c, l, f). Finally, we compared the performance of our algorithm with that of [7] by reporting the F-score, Precision, and Recall in Table, for segment size =. We see that STREAMING LOCAL SEARCH is able to produce summaries competitive with the exhaustive search [7], and the centralized baselines. Using constraints to generate customized summaries. In our second experiment, we show how constraints can be applied to generate customized summaries. We apply STREAMING LOCAL SEARCH to YouTube video 6, which is a part of America s Got Talent series. It features a singer and three judges in the judging panel. Here, we produced two sets of summaries using different constraints. The top row in Fig. 2 shows a summary focused on the judges. Here we considered 3 uniform matroid constraints to limit the number of frames chosen containing each of the judges. The limits for all the matroid constraints are set to 3. To produce real-time summaries while receiving the video, we used the Viola-Jones algorithm [29] to detect faces in each video frame, and trained a multiclass support vector machines using histograms of oriented gradients (HOG) to recognize different faces. The bottom row in Fig. 2 shows another summary focused on the singer using one matroid constraint. To further enhance the quality of the summaries, we assigned different weights to the frames based on the probability for each frame to contain objects, using selective search [3]. Assigning a higher cost to the frames with a low probability of having objects let us filter uninformative and blurry frames, and produce a summary closer to the human created summaries, as shown in Figures 3. 7 Conclusion We have developed the first streaming algorithm, STREAMING LOCAL SEARCH, for maximizing non-monotone submodular functions subject to a p-system and d-knapsack constraints. We have showed it s applications to video summarization for producing online summaries in streaming setting. Our experimental results showed that our method is able to speedup the summarization task more than 7 times, while achieving a similar performance to the baselines. This makes it an appealing approach in real-time summarization tasks. We note that our method is applicable to any summarization task with a non-monotone submodular utility function. Given the importance of submodular optimization to numerous data mining and machine learning applications, we believe our result is an important step towards providing real-time summaries and prediction. 8

9 References [] Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, and Jeff A Bilmes. Learning mixtures of submodular functions for image collection summarization. In Advances in neural information processing systems, pages 43 42, 24. [2] Ian Simon, Noah Snavely, and Steven M Seitz. Scene summarization for online image collections. In Computer Vision, 27. ICCV 27. IEEE th International Conference on, pages 8. IEEE, 27. [3] Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume, pages Association for Computational Linguistics, 2. [4] Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems, pages , 23. [5] Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In Proceedings of the International Conference on Machine Learning (ICML), 25. [6] Alex Kulesza, Ben Taskar, et al. Determinantal point processes for machine learning. Foundations and Trends R in Machine Learning, 5(2 3):23 286, 22. [7] Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. Diverse sequential subset selection for supervised video summarization. In Advances in Neural Information Processing Systems, pages , 24. [8] Jennifer Gillenwater, Alex Kulesza, and Ben Taskar. Discovering diverse and salient threads in document collections. In Proceedings of the 22 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages Association for Computational Linguistics, 22. [9] Chong-Wah Ngo, Yu-Fei Ma, and Hong-Jiang Zhang. Automatic video summarization by graph modeling. In Computer Vision, 23. Proceedings. Ninth IEEE International Conference on, pages 4 9. IEEE, 23. [] Tiecheng Liu and John Kender. Optimization algorithms for the selection of key frame sequences of variable length. Computer Vision ECCV 22, pages 3 35, 26. [] Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. Discovering important people and objects for egocentric video summarization. In Computer Vision and Pattern Recognition (CVPR), 22 IEEE Conference on, pages IEEE, 22. [2] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions i. Mathematical Programming, 4(): , 978. [3] Moran Feldman, Christopher Harshaw, and Amin Karbasi. Greed is good: Near-optimal submodular maximization via greedy optimization. arxiv preprint arxiv:74.652, 27. [4] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages ACM, 29. [5] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained submodular maximization: Personalized data summarization. In ICLM 6: Proceedings of the 33rd International Conference on Machine Learning (ICML), 26. [6] Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 2th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, 24. [7] Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: Matchings, matroids, and more. Mathematical Programming, 54(-2): , 25. [8] Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodular function maximization. In International Colloquium on Automata, Languages, and Programming, pages Springer, 25. [9] Odile Macchi. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7():83 22, 975. [2] Chun-Wa Ko, Jon Lee, and Maurice Queyranne. An exact algorithm for maximum entropy sampling. Operations Research, 43(4):684 69, 995. [2] Uriel Feige, Vahab S Mirrokni, and Jan Vondrak. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 4(4):33 53, 2. 9

10 [22] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In International Workshop on Internet and Network Economics, pages Springer, 2. [23] Ashwinkumar Badanidiyuru and Jan Vondrák. Fast algorithms for maximizing submodular functions. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pages Society for Industrial and Applied Mathematics, 24. [24] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32():4 43, 24. [25] Sandra Eliza Fontes De Avila, Ana Paula Brandão Lopes, Antonio da Luz, and Arnaldo de Albuquerque Araújo. Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, 32():56 68, 2. [26] Florent Perronnin and Christopher Dance. Fisher kernels on visual vocabularies for image categorization. In Computer Vision and Pattern Recognition, 27. CVPR 7. IEEE Conference on, pages 8. IEEE, 27. [27] David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 6(2):9, 24. [28] Esa Rahtu, Juho Kannala, Mikko Salo, and Janne Heikkilä. Segmenting salient objects from images and videos. Computer Vision ECCV 2, pages , 2. [29] Paul Viola and Michael J Jones. Robust real-time face detection. International journal of computer vision, 57(2):37 54, 24. [3] Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International journal of computer vision, 4(2):54 7, 23. [3] Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. A tight linear time (/2)-approximation for unconstrained submodular maximization. SIAM Journal on Computing, 44(5):384 42, 25.

11 Supplementary Materials. A Analysis of STREAMING LOCAL SEARCH Proof of theorem 5. Proof. For each i [, ] by assumption we have f(s i ) αf(s i C i ), (2) where C i = C X i, for all i [,], and X i is the subset of elements processed by INDSTREAM i. Therefore,C = C. Also, for each i, using the tight /2-approximation algorithm for unconstrained maximization from [3] we get f(s i ) 2 f(s i C i ). (3) Now, via a similar argument as used in [5], by induction we show that t [2,] we have t f(s i C i )+(t i)f(s i C i ) (t )f(c )+f( t S i C ). (4) Using submodularity, for the base case t = 2, we get: f(s C )+f(s 2 C 2)+f(S C ) f(s S 2 C )+f(c 2)+f(S C ) f(s S 2 C )+f(c ) Now, we prove the inductive case: t t t [f(s i C i)+(t i)f(s i C i)] [f(s i C i)+(t i)f(s i C i)]+f(s t C t)+ f(s i C i) t (t 2)f(C )+f( t S i C )+f(s t C t)+ f(s i C ) t (t 2)f(C )+f( t S i C )+f(c t)+ f(s i C ) (t 2)f(C )+f( t S i C )+f(c ) = (t )f(c )+f( t S i C ). (5) The first inequality is resulted from Eq. 4, and the last two inequalities are followed from submodularity. Multiplying Eq. 2 by and Eq. 3 by2( i), and using Eq. 5, we get: () f(s j )+ 2( i)f(s j) f(s i C i )+ ( i)f(s i C ) ( )f(c ) Taking a max over the left hand side of the equation we get the following inequality Hence, [ 2 +2 ( i) ] max(f(s i),f(s i)) ( )f(c ) i max(f(s i),f(s i)) α( α) i 2 α f(c ).

12 Proof of theorem 5.3 Proof. Consider an optimal solution C and set ρ = 2( α) (2/α+2d ) f(c ). By submodularity we know that m f(c ) mk, where k is an upper bound on the cardinality of the largest feasible solution, and m is the maximum value of any singleton element. Hence 2m( α) 2/α+2d ρ 2mk( α) 2/α+2d. (6) Thus there is a run of the algorithm with density thresholdρ R such that ρ ρ (+ǫ)ρ. For the run of the algorithm correspond to ρ, we call the solution of the first instance INDSTREAMDENSITY, S ρ. If INDSTREAMDENSITY terminates by exceeding some knapsack capacity, we know that for one of the knapsacks i [d], we have c i (S ρ ) >, and hence also d j= c i(s ρ ) >. On the other hand, the extra density threshold we used for selecting the elements tells us that for anyj S ρ, we have fsρ(j) d ρ. I.e., the marginal gain of every element added to j= cij the solutions ρ was greater than or equal to ρ d j= c ij. Therefore, we get that f(s ρ ) j S ρ ( ρ d ) c i,j > ρ. Note that S ρ it s not a feasible solution, as it exceeds the ith knapsack capacity. However, the solution before adding the last element j to S ρ, i.e. T ρ = S ρ {j}, and the last element itself are both feasible solutions, and by submodularity, the best of them provide us with the value of at least max{f(t ρ ),f({j})} ρ 2 2( α) (+ε)(2/α+2d ) f(c ) On the other hand, if INDSTREAMDENSITY terminates without exceeding any knapsack capacity. We dividec into two sets. LetC<ρ be the set of elements fromc which cannot be added because f Sρ (j) their density is below the threshold, i.e., d < ρ and C cij ρ be the set of elements from C which cannot be added due to independence system constraints. f Sρ (C <ρ) e C <ρ ρ d d c ie = ρ c ie dρ dρ = 2d( α) 2/α +2d f(c ) (7) e C <ρ On the other hand, since S ρ is a feasible solution for INDSTREAM without any density threshold, from Eq. 2 we know thatf(s ρ ) αf(s ρ C ρ ), and thus we obtain: f Sρ (C ρ ) =f(s ρ C ρ ) f(s ρ) ( α ) f(s ρ ) = α α f(s ρ). (8) Adding Eq 7 and 8, and using submodularity we get: Therefore, f(s ρ C ) f ( S ρ ) f Sρ (C <ρ)+f Sρ (C ρ) α α f(s ρ)+dρ f(s ρ ) αf(s ρ C ) αdρ Using a similar induction to the one we used in the proof of Theorem 5., we show that at the solution for one of the INDSTREAM i has the desired approximation guarantee. We multiply Eq. 5 by and Eq. 3 by2( i) to get: () f(s j )+ 2( i)f(s j) [f(s i C i ) dρ]+ ( i)f(s i C ) ( )f(c ) dρ/α 2

13 Taking a max over the left hand side of the equation we get the following inequality Hence, [ 2 +2 ( i) ] max(f(s i i),f(s i)) ( )f(c ) dρ/α max(f(s i i),f(s i)) α( α) 2 α f(c ) αdρ 2 α Replacing the corresponding value for ρ from Eq. 6, we get the desired result: max(f(s i),f(s i)) α( α) i = 2 α f(c ) α 2/α+2d f(c ) 2αd( α) (2 α)(2/α+2d )(+ε) f(c ) 3

Streaming Non-monotone Submodular Maximization: Personalized Video Summarization on the Fly

Streaming Non-monotone Submodular Maximization: Personalized Video Summarization on the Fly Baharan Mirzasoleiman ETH Zurich, Switzerland baharanm@ethz.ch Stefanie Jegelka MIT, United States stefje@mit.edu