Results 1  10
of
235,213
Estimating the number of clusters in a dataset via the Gap statistic
, 2000
"... We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. kmeans or hierarchical), comparing the change in within cluster dispersion to that expected under an appropriate reference ..."
Abstract

Cited by 502 (1 self)
 Add to MetaCart
We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. kmeans or hierarchical), comparing the change in within cluster dispersion to that expected under an appropriate reference
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 865 (3 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing
BIRCH: an efficient data clustering method for very large databases
 In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD
, 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multidir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract

Cited by 576 (2 self)
 Add to MetaCart
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multidir nensional clataset. Prior work does not adequately address the problem
SMOTE: Synthetic Minority Oversampling Technique
 Journal of Artificial Intelligence Research
, 2002
"... An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often realworld data sets are predominately composed of ``normal'' examples with only a small percentag ..."
Abstract

Cited by 634 (27 self)
 Add to MetaCart
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often realworld data sets are predominately composed of ``normal'' examples with only a small
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 783 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We
Group formation in large social networks: membership, growth, and evolution
 IN KDD ’06: PROCEEDINGS OF THE 12TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2006
"... The processes by which communities come together, attract new members, and develop over time is a central research issue in the social sciences — political movements, professional organizations, and religious denominations all provide fundamental examples of such communities. In the digital domain, ..."
Abstract

Cited by 496 (19 self)
 Add to MetaCart
: friendship links and community membership on LiveJournal, and coauthorship and conference publications in DBLP. Both of these datasets provide explicit userdefined communities, where conferences serve as proxies for communities in DBLP. We study how the evolution of these communities relates to properties
The Digital Michelangelo Project: 3D Scanning of Large Statues
, 2000
"... We describe a hardware and software system for digitizing the shape and color of large fragile objects under nonlaboratory conditions. Our system employs laser triangulation rangefinders, laser timeofflight rangefinders, digital still cameras, and a suite of software for acquiring, aligning, merg ..."
Abstract

Cited by 488 (8 self)
 Add to MetaCart
We describe a hardware and software system for digitizing the shape and color of large fragile objects under nonlaboratory conditions. Our system employs laser triangulation rangefinders, laser timeofflight rangefinders, digital still cameras, and a suite of software for acquiring, aligning
Toward optimal feature selection
 In 13th International Conference on Machine Learning
, 1995
"... In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it g ..."
Abstract

Cited by 480 (9 self)
 Add to MetaCart
. The conditions under which the approximate algorithm is successful are examined. Empirical results are given on a number of data sets, showing that the algorithm e ectively handles datasets with a very large number of features.
Paradox lost? Firmlevel evidence on the returns to information systems.
 Manage Sci
, 1996
"... T he "productivity paradox" of information systems (IS) is that, despite enormous improvements in the underlying technology, the benefits of IS spending have not been found in aggregate output statistics.One explanation is that IS spending may lead to increases in product quality or varie ..."
Abstract

Cited by 465 (23 self)
 Add to MetaCart
T he "productivity paradox" of information systems (IS) is that, despite enormous improvements in the underlying technology, the benefits of IS spending have not been found in aggregate output statistics.One explanation is that IS spending may lead to increases in product quality
Clustering Gene Expression Patterns
, 1999
"... Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the ana ..."
Abstract

Cited by 451 (11 self)
 Add to MetaCart
expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an ngene dataset is O(n 2 (log(n)) c ). We also present a practical
Results 1  10
of
235,213