Many online data sources are updated autonomously and independently. In this paper, we make the case for estimating the change frequency of the data, to improve web crawlers, web caches and to help data mining. We first identify various scenarios, where different applications have different requirements on the accuracy of the estimated frequency. Then we develop several "frequency estimators " for the identified scenarios. In developing the estimators, we analytically show how precise/effective the estimators are, and we show that the estimators that we propose can improve precision significantly. 1
|
574
|
Bayesian Theory
– Bernardo, Smith
- 1994
|
|
377
|
Implementing Data Cubes Efficiently
– Harinarayan, Rajaraman, et al.
- 1996
|
|
230
|
View maintenance in a warehousing environment
– Zhuge, Garcia-Molina, et al.
- 1995
|
|
221
|
On the Scale and Performance of Cooperative Web Proxy Caching
– Wolman, Voelker, et al.
- 1999
|
|
189
|
Rate of Change and other Metrics: a Live Study of the World Wide Web
– Douglis, Feldmann, et al.
- 1997
|
|
142
|
World-Wide Web Cache Consistency
– Gwertzman, Seltzer
- 1996
|
|
130
|
The Evolution of the Web and Implications for an Incremental Crawler
– Cho, Garica-Molina
- 2000
|
|
123
|
Synchronizing a Database to Improve Freshness
– Cho, García-Molina
- 2000
|
|
91
|
How Dynamic is the Web
– Brewington, Cybenko
- 2000
|
|
89
|
An introduction to stochastic modeling
– Taylor, Karlin
- 1998
|
|
86
|
A Scalable Web Cache Consistency Architecture
– Yu, Breslau, et al.
- 1999
|
|
65
|
The Stanford Data Warehousing Project
– Hammer, Garcia-Molina, et al.
- 1995
|
|
54
|
Optimal Robot Scheduling for Web Search Engines
– Coffman, Liu, et al.
- 1998
|
|
54
|
Towards a better understanding of web resources and server responses for improved caching
– Wills, C, et al.
- 1999
|
|
53
|
An adaptive model for optimizing performance of an incremental Web crawler
– Edwards, McCurley, et al.
- 2002
|
|
40
|
Bayesian Statistics: An Introduction
– Lee
- 1992
|
|
36
|
Keeping up with the changing Web
– Brewington, Cybenko
- 2000
|
|
32
|
Random Point Processes
– Snyder
- 1975
|
|
31
|
Queueing Systems: Theory
– Kleinrock
- 1975
|
|
31
|
Calculus and Analytic Geometry
– Thomas
- 1980
|
|
20
|
World Wide Web caching: The application-level view of the Internet
– Baentsch, Baum, et al.
- 1997
|
|
13
|
Introduction to Bayesian Inference and
– Winkler
- 1972
|
|
5
|
An Introduction to Stochastic Modeling, 3rd ed
– Taylor, Karlin
- 1998
|
|
4
|
Parameter estimation in Poisson processes
– Misra, Sorenson
- 1975
|
|
2
|
Estimation of internet file-access/modification rates from incomplete data
– Matloff
- 2005
|
|
2
|
Calculus and analytic geometry, 4th ed
– Thomas
- 1969
|
|
2
|
Using control charts for parameter estimation of a homogeneous poisson process
– Yacout, Chang
- 1996
|
|
1
|
A bayesian approach to parameter and reliability estimation in the Poisson distribution
– Canavos
- 1972
|
|
1
|
Estimating Frequency of Change · 31
– Baentsch, Baum, et al.
- 1997
|
|
1
|
Methods of mathematical physics, 1st ed
– Courant, David
- 1989
|