Results 1 - 10
of
11
Finding representative set from massive data
- IEEE International Conference on Data Mining
, 2005
"... In the information age, data is pervasive. In some applications, data explosion is a significant phenomenon. The massive data volume poses challenges to both human users and computers. In this project, we propose a new model for identifying representative set from a large database. A representative ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In the information age, data is pervasive. In some applications, data explosion is a significant phenomenon. The massive data volume poses challenges to both human users and computers. In this project, we propose a new model for identifying representative set from a large database. A representative set is a special subset of the original dataset, which has three main characteristics: It is significantly smaller in size compared to the original dataset. It captures the most information from the original dataset compared to other subsets of the same size. It has low redundancy among the representatives it contains. We use informationtheoretic measures such as mutual information and relative entropy to measure the representativeness of the representative set. We first design a greedy algorithm and then present a heuristic algorithm that delivers much better performance. We run experiments on two real datasets and evaluate the effectiveness of our representative set in terms of coverage and accuracy. The experiments show that our representative set attains expected characteristics and captures information more efficiently. 1.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints
"... Abstract — Continuous monitoring and diagnosis of network performance are of crucial importance for the Internet access service and virtual private network (VPN) service providers. Various operational constraints, which are crucial to the practice, are largely ignored in previous monitoring system d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract — Continuous monitoring and diagnosis of network performance are of crucial importance for the Internet access service and virtual private network (VPN) service providers. Various operational constraints, which are crucial to the practice, are largely ignored in previous monitoring system designs, or are simply replaced with load balancing problems which do not work for real heterogeneous networks. Given these real-world challenges, in this paper, we design a V Scope monitoring system with the following contributions. First, we design a greedy-assisted linear programming algorithm to select as few monitors as possible that can monitor the whole network under the operational constraints. Secondly, VScope takes a multi-round measurement approach to further reduce monitors deployment/management cost, by scheduling the path measurements in different rounds under the operational constraints. Evaluations based on several real VPN topologies from a tier-1 ISP as well as some other synthetic topologies demonstrate that VScope is promising to solve the aforementioned challenges. 1.
Maximizing visibility in nonconvex polygons: nonsmooth analysis and gradient algorithm design
"... Abstract — This paper presents a motion control algorithm for a planar mobile observer such as, e.g., a mobile robot equipped with an omni-directional camera. We propose a nonsmooth gradient algorithm for the problem of maximizing the area of the region visible to the observer in a simple nonconvex ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract — This paper presents a motion control algorithm for a planar mobile observer such as, e.g., a mobile robot equipped with an omni-directional camera. We propose a nonsmooth gradient algorithm for the problem of maximizing the area of the region visible to the observer in a simple nonconvex polygon. First, we show that the visible area is almost everywhere a locally Lipschitz function of the observer location. Second, we provide a novel version of LaSalle Invariance Principle for discontinuous vector fields and Lyapunov functions with a finite number of discontinuities. Finally, we establish the asymptotic convergence properties of the nonsmooth gradient algorithm and we illustrate numerically its performance. I.
Partial Information Spreading with Application to Distributed Maximum Coverage ABSTRACT
"... This paper addresses partial information spreading among n nodes of a network. As opposed to traditional information spreading, where each node has a message that must be received by all nodes, we propose a relaxed requirement, where only n/c nodes need to receive each message, and every node should ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper addresses partial information spreading among n nodes of a network. As opposed to traditional information spreading, where each node has a message that must be received by all nodes, we propose a relaxed requirement, where only n/c nodes need to receive each message, and every node should receive n/c messages, for some c ≥ 1. As a key tool in our study we introduce the novel concept of weak conductance, a generalization of classic graph conductance which allows to analyze the time required for partial information spreading. We show the power of weak conductance as a measure of how well-knit the components of a graph are, by giving an example of a graph family for which the conductance is O(n −2), while the weak conductance is as large as 1/2. For such graphs, weak conductance can be used to show that partial information spreading requires time complexity of O(log n). Finally, we demonstrate the usefulness of partial information spreading in solving the maximum coverage problem, which naturally arises in circuit layout, job scheduling and facility location, as well as in distributed resource allocation with a global budget constraint. Our algorithm yields a constant approximation factor and a constant deviation from the given budget. For graphs with a constant weak conductance, this implies a scalable time complexity for solving a problem with a global constraint.
Graph-based Seed Selection for Web-scale Crawlers
"... One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscale incremental crawlers. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds c ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscale incremental crawlers. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a repository with more “good ” and less “bad ” pages. We propose a graph-based framework for crawler seed selection, and present several algorithms within this framework. Evaluation on real web data showed significant improvements over heuristic seed selection approaches.
Structured Learning of Two-Level Dynamic Rankings
"... For ambiguous queries, conventional retrieval systems are bound by two conflicting goals. On the one hand, they should diversify and strive to present results for as many query intents as possible. On the other hand, they should provide depth for each intent by displaying more than a single result. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
For ambiguous queries, conventional retrieval systems are bound by two conflicting goals. On the one hand, they should diversify and strive to present results for as many query intents as possible. On the other hand, they should provide depth for each intent by displaying more than a single result. Since both diversity and depth cannot be achieved simultaneously in the conventional static retrieval model, we propose a new dynamic ranking approach. In particular, our proposed two-level dynamic ranking model allows users to adapt the ranking through interaction, thus overcoming the constraints of presenting a one-size-fits-all static ranking. In this model, a user’s interactions with the first-level ranking are used to infer this user’s intent, so that second-level rankings can be inserted to provide more results relevant to this intent. Unlike previous dynamic ranking models, we provide an algorithm to efficiently compute dynamic rankings with provable approximation guarantees.We also propose the first principled algorithm for learning dynamic ranking functions from training data. In addition to the theoretical results, we provide empirical evidence demonstrating the gains in retrieval quality over conventional approaches.
Covering Games: Approximation through Non-Cooperation
"... We propose approximation algorithms under game-theoretic considerations. We indroduce and study the general covering problem which is a natural generalization of the well-studied max-n-cover problem. In the general covering problem, we are given a universal set of weighted elements E and n collect ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose approximation algorithms under game-theoretic considerations. We indroduce and study the general covering problem which is a natural generalization of the well-studied max-n-cover problem. In the general covering problem, we are given a universal set of weighted elements E and n collections of subsets of the elements. The task is to choose one subset from each collection such that the total weight of their union is as large as possible. In our game-theoretic setting, the choice in each collection is made by an independent player. For covering an element, the players receive a payoff defined by a nonincreasing utility sharing function. This function defines the fraction that each covering player receives from the weight of the elements. We show how to construct a utility sharing function such that every Nash Equilibrium approximates the optimal solution by a factor of 1 − 1 e. We also prove that any sequence of unilateral improving steps is polynomially bounded. This gives rise to a polynomial-time local search approximation algorithm whose approximation ratio is best possible.
Feasibility-Preserving Crossover for Maximum k-Coverage Problem
"... The maximum k-coverage problem is a generalized version of covering problems. We introduce the problem formally and analyze its property in relation to the operators of genetic algorithm. Based on the analysis, we propose a new crossover tailored to the maximum k-coverage problem. While traditional ..."
Abstract
- Add to MetaCart
The maximum k-coverage problem is a generalized version of covering problems. We introduce the problem formally and analyze its property in relation to the operators of genetic algorithm. Based on the analysis, we propose a new crossover tailored to the maximum k-coverage problem. While traditional n-point crossovers have a problem of requiring repair steps, the proposed crossover has an additional advantage of always producing feasible solutions. We give a comparative analysis of the proposed crossover through experiments.

