Results 1  10
of
1,464,263
CURE: An Efficient Clustering Algorithm for Large Data sets
 Published in the Proceedings of the ACM SIGMOD Conference
, 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract

Cited by 722 (5 self)
 Add to MetaCart
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new
Lag length selection and the construction of unit root tests with good size and power
 Econometrica
, 2001
"... It is widely known that when there are errors with a movingaverage root close to −1, a high order augmented autoregression is necessary for unit root tests to have good size, but that information criteria such as the AIC and the BIC tend to select a truncation lag (k) that is very small. We conside ..."
Abstract

Cited by 558 (14 self)
 Add to MetaCart
(1996). We also extend the M tests developed in Perron and Ng (1996) to allow for GLS detrending of the data. The MIC along with GLS detrended data yield a set of tests with desirable size and power properties.
The pyramid match kernel: Discriminative classification with sets of image features
 IN ICCV
, 2005
"... Discriminative learning is challenging when examples are sets of features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernelbased classification methods can learn complex decision boundaries, but a kernel over unordered set inputs must somehow solve for correspondenc ..."
Abstract

Cited by 544 (29 self)
 Add to MetaCart
for correspondences – generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in this space. This “pyramid match” computation is linear
The SPLASH2 programs: Characterization and methodological considerations
 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1995
"... The SPLASH2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed sharedaddressspace multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH2 programs in terms of fundamental propertie ..."
Abstract

Cited by 1420 (12 self)
 Add to MetaCart
properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties
Space/Time Tradeoffs in Hash Coding with Allowable Errors
 Communications of the ACM
, 1970
"... this paper tradeoffs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages onebyone for membership in a given set of messages. Two new hash coding methods are examined and compared with a particular conventional h ..."
Abstract

Cited by 2097 (0 self)
 Add to MetaCart
hashcoding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency
Optimal Brain Damage
, 1990
"... We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved sp ..."
Abstract

Cited by 510 (5 self)
 Add to MetaCart
We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved
On the Resemblance and Containment of Documents
 In Compression and Complexity of Sequences (SEQUENCES’97
, 1997
"... Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection probl ..."
Abstract

Cited by 506 (6 self)
 Add to MetaCart
Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection
Data Streams: Algorithms and Applications
, 2005
"... In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerg ..."
Abstract

Cited by 533 (22 self)
 Add to MetaCart
In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has
Graphbased algorithms for Boolean function manipulation
 IEEE TRANSACTIONS ON COMPUTERS
, 1986
"... In this paper we present a new data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the representations introduced by Lee [1] and Akers [2], but with further restrictions on th ..."
Abstract

Cited by 3526 (46 self)
 Add to MetaCart
In this paper we present a new data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the representations introduced by Lee [1] and Akers [2], but with further restrictions
Generating Representative Web Workloads for Network and Server Performance Evaluation
, 1997
"... One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web worklo ..."
Abstract

Cited by 944 (11 self)
 Add to MetaCart
workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file
Results 1  10
of
1,464,263