Results 1  10
of
11
Fast MultiStage Submodular Maximization
"... Motivated by extremely largescale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MULTGREED), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Motivated by extremely largescale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MULTGREED), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies for a target submodular function but require less memory and are easy to evaluate. We theoretically analyze the performance guarantee of the multistage framework and give examples on how to design instances of MULTGREED for a broad range of natural submodular functions. We show that MULTGREED performs very closely to the standard greedy algorithm given appropriate surrogate functions and argue how our framework can easily be integrated with distributive algorithms for further optimization. We complement our theory by empirically evaluating on several realworld problems, including data subset selection on millions of speech samples where MULTGREED yields at least a thousand times speedup and superior results over the stateoftheart selection methods. 1
Monotone Submodular Maximization over a Matroid via NonOblivious Local Search
, 2013
"... We present an optimal, combinatorial 1−1/e approximation algorithm for monotone submodular optimization over a matroid constraint. Compared to the continuous greedy algorithm (Calinescu, Chekuri, Pál and Vondrák, 2008), our algorithm is extremely simple and requires no rounding. It consists of the g ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present an optimal, combinatorial 1−1/e approximation algorithm for monotone submodular optimization over a matroid constraint. Compared to the continuous greedy algorithm (Calinescu, Chekuri, Pál and Vondrák, 2008), our algorithm is extremely simple and requires no rounding. It consists of the greedy algorithm followed by local search. Both phases are run not on the actual objective function, but on a related auxiliary potential function, which is also monotone and submodular. In our previous work on maximum coverage (Filmus and Ward, 2012), the potential function gives more weight to elements covered multiple times. We generalize this approach from coverage functions to arbitrary monotone submodular functions. When the objective function is a coverage function, both definitions of the potential function coincide. Our approach generalizes to the case where the monotone submodular function has restricted curvature. For any curvature c, we adapt our algorithm to produce a (1−e −c)/c approximation. This matches results of Vondrák (2008), who has shown that the continuous greedy algorithm produces a (1 − e −c)/c approximation when the objective function has curvature c with respect to the optimum, and proved that achieving any better approximation ratio is impossible in the value oracle model. 1
Budgeted influence maximization for multiple products
 CoRR
"... The typical algorithmic problem in viral marketing aims to identify a set of influential users in a social network, who, when convinced to adopt a product, shall influence other users in the network and trigger a large cascade of adoptions. However, the host (the owner of an online social platform) ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The typical algorithmic problem in viral marketing aims to identify a set of influential users in a social network, who, when convinced to adopt a product, shall influence other users in the network and trigger a large cascade of adoptions. However, the host (the owner of an online social platform) often faces more constraints than a single product, endless user attentions, unlimited budget and unbounded time; in reality, multiple products need to be advertised, each user can tolerate only a small number of recommendations, influencing user has a cost and advertisers have only limited budgets, and the adoptions need to be maximized within a short time window. Given theses myriads of user, monetary, and timing constraints, it is extremely challenging for the host to design principled and efficient viral market algorithms with provable guarantees. In this paper, we provide a novel solution by formulating the problem as a submodular maximization in a continuoustime diffusion model under an intersection of a matroid and multiple knapsack constraints. We also propose an adaptive threshold greedy algorithm which can be faster than the traditional greedy algorithm with lazy evaluation, and scalable to networks with million of nodes. Furthermore, our mathematical formulation allows us to prove that the algorithm can achieve an approximation factor of ka/(2 + 2k) when ka out of the k knapsack constraints are active, which also improves over previous guarantees from combinatorial optimization literature. In the case when influencing each user has uniform cost, the approximation becomes even better to a factor of 1/3. Extensive synthetic and real world experiments demonstrate that our budgeted influence maximization algorithm achieves thestateoftheart in terms of both effectiveness and scalability, often beating the next best by significant margins. 1
Streaming Submodular Maximization: Massive Data Summarization on the Fly
, 2014
"... How can one summarize a massive data set “on the fly”, i.e., without even having seen it in its entirety? In this paper, we address the problem of extracting representative elements from a large stream of data. I.e., we would like to select a subset of say k data points from the stream that are most ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
How can one summarize a massive data set “on the fly”, i.e., without even having seen it in its entirety? In this paper, we address the problem of extracting representative elements from a large stream of data. I.e., we would like to select a subset of say k data points from the stream that are most representative according to some objective function. Many natural notions of “representativeness ” satisfy submodularity, an intuitive notion of diminishing returns. Thus, such problems can be reduced to maximizing a submodular set function subject to a cardinality constraint. Classical approaches to submodular maximization require full access to the data set. We develop the first efficient streaming algorithm with constant factor 1/2 − ε approximation guarantee to the optimum solution, requiring only a single pass through the data, and memory independent of data size. In our experiments, we extensively evaluate the effectiveness of our approach on several applications, including training largescale kernel methods and exemplarbased clustering, on millions of data points. We observe that our streaming method, while achieving practically the same utility value, runs about 100 times faster than previous work.
Parallel Task Routing for Crowdsourcing
"... An ideal crowdsourcing or citizenscience system would route tasks to the most appropriate workers, but the best assignment is unclear because workers have varying skill, tasks have varying difficulty, and assigning several workers to a single task may significantly improve output quality. This pap ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
An ideal crowdsourcing or citizenscience system would route tasks to the most appropriate workers, but the best assignment is unclear because workers have varying skill, tasks have varying difficulty, and assigning several workers to a single task may significantly improve output quality. This paper defines a space of task routing problems, proves that even the simplest is NPhard, and develops several approximation algorithms for parallel routing problems. We show that an intuitive class of requesters ’ utility functions is submodular, which lets us provide iterative methods for dynamically allocating batches of tasks that make nearoptimal use of available workers in each round. Experiments with live oDesk workers show that our task routing algorithm uses only 48% of the human labor compared to the commonly used roundrobin strategy. Further, we provide versions of our task routing algorithm which enable it to scale to large numbers of workers and questions and to handle workers with variable response times while still providing significant benefit over common baselines.
Fast MultiStage Submodular Maximization: Extended version
"... Motivated by extremely largescale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MULTGREED), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Motivated by extremely largescale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MULTGREED), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies for a target submodular function but require less memory and are easy to evaluate. We theoretically analyze the performance guarantee of the multistage framework and give examples on how to design instances of MULTGREED for a broad range of natural submodular functions. We show that MULTGREED performs very closely to the standard greedy algorithm given appropriate surrogate functions and argue how our framework can easily be integrated with distributive algorithms for further optimization. We complement our theory by empirically evaluating on several realworld problems, including data subset selection on millions of speech samples where MULTGREED yields at least a thousand times speedup and superior results over the stateoftheart selection methods. 1
Lazier Than Lazy Greedy
"... Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first lineartime algorithm for maximizing a general monotone submodular function subject to a c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first lineartime algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTICGREEDY, can achieve a (1 − 1/e − ε) approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training largescale kernel methods, exemplarbased clustering, and sensor placement. We observe that STOCHASTICGREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTICGREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.
Research Statement
"... My research interest is in the design and analysis of algorithms for optimization. I am strongly motivated by applications, particularly in machine learning and the design of electronic markets. As a theoretician, I believe in formulating problems that are fundamental to these applications and yet s ..."
Abstract
 Add to MetaCart
(Show Context)
My research interest is in the design and analysis of algorithms for optimization. I am strongly motivated by applications, particularly in machine learning and the design of electronic markets. As a theoretician, I believe in formulating problems that are fundamental to these applications and yet sufficiently general to be applicable to a wide variety of domains. This has led me to focus on two areas, sequential decision making and discrete nonlinear optimization, introducing broad new problem formulations and solving them by novel algorithmic techniques. We are surrounded by problems where we need to make decisions without having some or all of the relevant information. However, we can learn from the results of our past actions. Examples of such problems are learning clickthrough rates of advertisements or learning the effectiveness of drugs during testing. My research focuses on this theme of sequential decision making and its applications to machine learning and algorithmic economics. Discrete optimization is at the center stage of algorithms and has applications to different areas of computer science. A burst of activity in applying it to real world problems has happened recently because of models which deal with nonlinearity. An example is the sensor placement problem where the total area covered by the sensors depends on their locations in a nonlinear manner. My research
Parallel Double Greedy Submodular Maximization
"... Many machine learning problems can be reduced to the maximization of submodular functions. Although well understood in the serial setting, the parallel maximization of submodular functions remains an open area of research with recent results [1] only addressing monotone functions. The optimal algor ..."
Abstract
 Add to MetaCart
(Show Context)
Many machine learning problems can be reduced to the maximization of submodular functions. Although well understood in the serial setting, the parallel maximization of submodular functions remains an open area of research with recent results [1] only addressing monotone functions. The optimal algorithm for maximizing the more general class of nonmonotone submodular functions was introduced by Buchbinder et al. [2] and follows a strongly serial doublegreedy logic and program analysis. In this work, we propose two methods to parallelize the doublegreedy algorithm. The first, coordinationfree approach emphasizes speed at the cost of a weaker approximation guarantee. The second, concurrency control approach guarantees a tight 1/2approximation, at the quantifiable cost of additional coordination and reduced parallelism. As a consequence we explore the tradeoff space between guaranteed performance and objective optimality. We implement and evaluate both algorithms on multicore hardware and billion edge graphs, demonstrating both the scalability and tradeoffs of each approach. 1
Streaming Algorithms for Submodular Function Maximization
, 2015
"... We consider the problem of maximizing a nonnegative submodular set function f: 2N → R+ subject to a pmatchoid constraint in the singlepass streaming setting. Previous work in this context has considered streaming algorithms for modular functions and monotone submodular functions. The main result i ..."
Abstract
 Add to MetaCart
We consider the problem of maximizing a nonnegative submodular set function f: 2N → R+ subject to a pmatchoid constraint in the singlepass streaming setting. Previous work in this context has considered streaming algorithms for modular functions and monotone submodular functions. The main result is for submodular functions that are nonmonotone. We describe deterministic and randomized algorithms that obtain a Ω(1p)approximation using O(k log k)space, where k is an upper bound on the cardinality of the desired set. The model assumes value oracle access to f and membership oracles for the matroids defining the pmatchoid constraint.