Results 1  10
of
909,218
CURE: An Efficient Clustering Algorithm for Large Data sets
 Published in the Proceedings of the ACM SIGMOD Conference
, 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract

Cited by 722 (5 self)
 Add to MetaCart
of random sampling and partitioning. A random sample drawn from the data set is first partitioned and each partition is partially clustered. The partial clusters are then clustered in a second pass to yield the desired clusters. Our experimental results confirm that the quality of clusters produced by CURE
Estimating standard errors in finance panel data sets: comparing approaches.
 Review of Financial Studies
, 2009
"... Abstract In both corporate finance and asset pricing empirical work, researchers are often confronted with panel data. In these data sets, the residuals may be correlated across firms and across time, and OLS standard errors can be biased. Historically, the two literatures have used different solut ..."
Abstract

Cited by 890 (7 self)
 Add to MetaCart
Abstract In both corporate finance and asset pricing empirical work, researchers are often confronted with panel data. In these data sets, the residuals may be correlated across firms and across time, and OLS standard errors can be biased. Historically, the two literatures have used different
Lag length selection and the construction of unit root tests with good size and power
 Econometrica
, 2001
"... It is widely known that when there are errors with a movingaverage root close to −1, a high order augmented autoregression is necessary for unit root tests to have good size, but that information criteria such as the AIC and the BIC tend to select a truncation lag (k) that is very small. We conside ..."
Abstract

Cited by 558 (14 self)
 Add to MetaCart
(1996). We also extend the M tests developed in Perron and Ng (1996) to allow for GLS detrending of the data. The MIC along with GLS detrended data yield a set of tests with desirable size and power properties.
Learning with local and global consistency.
 In NIPS,
, 2003
"... Abstract We consider the general problem of learning from labeled and unlabeled data, which is often called semisupervised learning or transductive inference. A principled approach to semisupervised learning is to design a classifying function which is sufficiently smooth with respect to the intr ..."
Abstract

Cited by 673 (21 self)
 Add to MetaCart
to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.
The design and implementation of FFTW3
 PROCEEDINGS OF THE IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our cu ..."
Abstract

Cited by 726 (3 self)
 Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our
Boosting and differential privacy
, 2010
"... Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved privacypreserving synopses of an input database. These are data structures that yield, for a given set Q of queries over an input database, reasonably accurate estimates of the resp ..."
Abstract

Cited by 648 (14 self)
 Add to MetaCart
Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved privacypreserving synopses of an input database. These are data structures that yield, for a given set Q of queries over an input database, reasonably accurate estimates
Localitysensitive hashing scheme based on pstable distributions
 In SCG ’04: Proceedings of the twentieth annual symposium on Computational geometry
, 2004
"... inÇÐÓ�Ò We present a novel LocalitySensitive Hashing scheme for the Approximate Nearest Neighbor Problem underÐÔnorm, based onÔstable distributions. Our scheme improves the running time of the earlier algorithm for the case of theÐnorm. It also yields the first known provably efficient approximate ..."
Abstract

Cited by 521 (8 self)
 Add to MetaCart
inÇÐÓ�Ò We present a novel LocalitySensitive Hashing scheme for the Approximate Nearest Neighbor Problem underÐÔnorm, based onÔstable distributions. Our scheme improves the running time of the earlier algorithm for the case of theÐnorm. It also yields the first known provably efficient approximate
Risks for the long run: A potential resolution of asset pricing puzzles
 JOURNAL OF FINANCE
, 1994
"... We model consumption and dividend growth rates as containing (i) a small longrun predictable component and (ii) fluctuating economic uncertainty (consumption volatility). These dynamics, for which we provide empirical support, in conjunction with Epstein and Zin’s (1989) preferences, can explain ke ..."
Abstract

Cited by 761 (63 self)
 Add to MetaCart
. As in the data, dividend yields predict returns and the volatility of returns is timevarying.
Distributional Clustering Of English Words
 In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics
, 1993
"... We describe and evaluate experimentally a method for clustering words according to their dis tribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the si ..."
Abstract

Cited by 629 (27 self)
 Add to MetaCart
distortion sets of clusters: as the an nealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchi cal "soft" clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to heldout test
Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods
 ADVANCES IN LARGE MARGIN CLASSIFIERS
, 1999
"... The output of a classifier should be a calibrated posterior probability to enable postprocessing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. Howev ..."
Abstract

Cited by 1051 (0 self)
 Add to MetaCart
sigmoid versus a kernel method trained with a regularized likelihood error function. These methods are tested on three dataminingstyle data sets. The SVM+sigmoid yields probabilities of comparable quality to the regularized maximum likelihood kernel method, while still retaining the sparseness
Results 1  10
of
909,218