Exploiting duality in summarization with deterministic guarantees (2007)
| Venue: | in KDD |
| Citations: | 7 - 3 self |
BibTeX
@INPROCEEDINGS{Karras07exploitingduality,
author = {Panagiotis Karras and Dimitris Sacharidis},
title = {Exploiting duality in summarization with deterministic guarantees},
booktitle = {in KDD},
year = {2007}
}
OpenURL
Abstract
Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a B log2 n log ɛ ∗ factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a log factor of 2 B log ɛ ∗ log B in time and B(1 − +log n log n) in space, where ɛ ∗ is the optimal error. These complexity advantages offer both a spaceefficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.







