Results 1 - 10
of
81
Data Clustering: A Review
- ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract
-
Cited by 912 (9 self)
- Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling
, 1997
"... Computing (ratios of) normalizing constants of probability models is a fundamental computational problem for many statistical and scientific studies. Monte Carlo simulation is an effective technique, especially with complex and high-dimensional models. This paper aims to bring to the attention of ge ..."
Abstract
-
Cited by 106 (2 self)
- Add to MetaCart
Computing (ratios of) normalizing constants of probability models is a fundamental computational problem for many statistical and scientific studies. Monte Carlo simulation is an effective technique, especially with complex and high-dimensional models. This paper aims to bring to the attention of general statistical audiences of some effective methods originating from theoretical physics and at the same time to explore these methods from a more statistical perspective, through establishing theoretical connections and illustrating their uses with statistical problems. We show that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences. The former generalizes importance sampling through the use of a single "bridge" density and is thus a case of bridge sampling in the sense of Meng and Wong (1996). Thermodynamic integration, which is also known in the numerical analysis literature as Oga...
A Rigorous Framework for Optimization of Expensive Functions by Surrogates
, 1998
"... The goal of the research reported here is to develop rigorous optimization algorithms to apply to some engineering design problems for which direct application of traditional optimization approaches is not practical. This paper presents and analyzes a framework for generating a sequence of approxima ..."
Abstract
-
Cited by 98 (12 self)
- Add to MetaCart
The goal of the research reported here is to develop rigorous optimization algorithms to apply to some engineering design problems for which direct application of traditional optimization approaches is not practical. This paper presents and analyzes a framework for generating a sequence of approximations to the objective function and managing the use of these approximations as surrogates for optimization. The result is to obtain convergence to a minimizer of an expensive objective function subject to simple constraints. The approach is widely applicable because it does not require, or even explicitly approximate, derivatives of the objective. Numerical results are presented for a 31-variable helicopter rotor blade design example and for a standard optimization test example. Key Words: Approximation concepts, surrogate optimization, response surfaces, pattern search methods, derivative-free optimization, design and analysis of computer experiments (DACE), computational engineering. # ...
Detecting Features in Spatial Point Processes with . . .
, 1995
"... We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of mine elds using reconnaissance aircraft images that erroneously identify many objects that are not mines. Another is the detection of seismic faults on the ..."
Abstract
-
Cited by 68 (32 self)
- Add to MetaCart
We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of mine elds using reconnaissance aircraft images that erroneously identify many objects that are not mines. Another is the detection of seismic faults on the basis of earthquake catalogs: earthquakes tend to be clustered close to the faults, but there are many that are farther away. Our solution uses model-based clustering based on a mixture model for the process, in which features are assumed to generate points according to highly linear multivariate normal densities, and the clutter arises according to a spatial Poisson process. Very nonlinear features are represented by several highly linear multivariate normal densities, giving a piecewise linear representation. The model is estimated in two stages. In the rst stage, hierarchical model-based clustering is used to provide a rst estimate of the features. In the second stage, this clustering is re ned using the EM algorithm. The number of features is found using an approximation to the posterior probability of each number of features. For the minefield
On Conditional and Intrinsic Autoregressions
, 1995
"... This paper discusses standard and intrinsic autoregressions and describes how the problems that arise can be alleviated using Dempster's (1972) algorithm or an appropriate modification. The approach partly represents a synthesis of standard geostatistical and Gaussian Markov random field formulation ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
This paper discusses standard and intrinsic autoregressions and describes how the problems that arise can be alleviated using Dempster's (1972) algorithm or an appropriate modification. The approach partly represents a synthesis of standard geostatistical and Gaussian Markov random field formulations. Some non-spatial applications are also mentioned. Some key words: Agricultural experiments; Bayesian image analysis; Conditional autoregressions; Dempster's algorithm; Geographical epidemiology; Geostatistics; Intrinsic autoregressions; Multi-way tables; Prior distributions; Spatial statistics; Surface reconstruction; Texture analysis. 1 Introduction
Neighborhood-Based Models for Social Networks
- Sociological Methodology
, 2002
"... Harrison White and several anonymous reviewers for valuable comments on the work. We argue that social networks can be modeled as the outcome of processes that occur in overlapping local regions of the network, termed local social neighborhoods. Each neighborhood is conceived as a possible site of i ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
Harrison White and several anonymous reviewers for valuable comments on the work. We argue that social networks can be modeled as the outcome of processes that occur in overlapping local regions of the network, termed local social neighborhoods. Each neighborhood is conceived as a possible site of interaction and corresponds to a subset of possible network ties. In this paper, we discuss hypotheses about the form of these neighborhoods, and we present two new and theoretically plausible ways in which neighborhood-based models for networks can be constructed. In the first, we introduce the notion of a setting structure, a directly hypothesized (or observed) set of exogenous constraints on possible neighborhood forms. In the second, we propose higher-order neighborhoods that are generated, in part, by the outcome of interactive network processes themselves. Applications of both approaches to model construction are presented, and the developments are considered within a general conceptual framework of locale for social networks. We show how assumptions about neighborhoods can be cast within a hierarchy of increasingly complex models; these models represent a progressively greater capacity for network processes to “reach ” across a network through long cycles or semi-paths. We argue that this class of models holds new promise for the development of empirically plausible models for networks and network-based processes. 2 1.
Nearest Neighbor Clutter Removal for Estimating Features in Spatial Point Processes
- Journal of the American Statistical Association
, 1996
"... We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of minefields using reconnaissance aircraft images that identify many objects that are not mines. Our solution uses K \Gammath nearest neighbor distances of p ..."
Abstract
-
Cited by 41 (15 self)
- Add to MetaCart
We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of minefields using reconnaissance aircraft images that identify many objects that are not mines. Our solution uses K \Gammath nearest neighbor distances of points in the process to classify them as clutter or otherwise. The observed K \Gammath nearest neighbor distances are modeled as a mixture distribution, the parameters of which are estimated by a simple EM algorithm. This method allows for detection of generally shaped features, that need not be path connected. In the minefield example this method yields high detection and low false positive rates. Another application, to outlining seismic faults, is considered, with some success. The method works well in high dimensions. The method can also be used to produce very high breakdown-point robust estimators of a covariance matrix. KEY WORDS: Breakdown point; Edge effects; EM algorithm; Image ana...
Objective Bayesian Analysis of Spatially Correlated Data
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Spatially varying phenomena are often modeled using Gaussian random fields, specified by their mean function and covariance function. The spatial correlation structure of these models is commonly specified to be of a certain form (e.g., spherical, power exponential, rational quadratic, or Matérn) wi ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
Spatially varying phenomena are often modeled using Gaussian random fields, specified by their mean function and covariance function. The spatial correlation structure of these models is commonly specified to be of a certain form (e.g., spherical, power exponential, rational quadratic, or Matérn) with a small number of unknown parameters. We consider objective Bayesian analysis of such spatial models, when the mean function of the Gaussian random field is specified as in a linear model. It is thus necessary to determine an objective (or default) prior distribution for the unknown mean and covariance parameters of the random field. We first
Spatial Econometrics
- PALGRAVE HANDBOOK OF ECONOMETRICS: VOLUME 1, ECONOMETRIC THEORY
, 2001
"... Spatial econometric methods deal with the incorporation of spatial interaction and spatial structure into regression analysis. The field has seen a recent and rapid growth spurred both by theoretical concerns as well as by the need to be able to apply econometric models to emerging large geocoded da ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Spatial econometric methods deal with the incorporation of spatial interaction and spatial structure into regression analysis. The field has seen a recent and rapid growth spurred both by theoretical concerns as well as by the need to be able to apply econometric models to emerging large geocoded data bases. The review presented in this chapter outlines the basic terminology and discusses in some detail the specification of spatial effects, estimation of spatial regression models, and specification tests for spatial effects.
Practical maximum pseudolikelihood for spatial point patterns
- Australian and New Zealand Journal of Statistics
, 2000
"... This paper describes a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point process. The method is an extension of Berman & Turner’s (1992) device for maximizing the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class o ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
This paper describes a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point process. The method is an extension of Berman & Turner’s (1992) device for maximizing the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood is known explicitly, except for the computation of an integral over the sampling region. Approximation of this integral by a finite sum in a special way yields an approximate pseudolikelihood which is formally equivalent to the (weighted) likelihood of a loglinear model with Poisson responses. This can be maximized using standard statistical software for generalized linear or additive models, provided the conditional intensity of the process takes an ‘exponential family ’ form. Using this approach a wide variety of spatial point process models of Gibbs type can be fitted rapidly, incorporating spatial trends, interaction between points, dependence on spatial covariates, and mark information. Key words: area-interaction process; Berman–Turner device; Dirichlet tessellation; edge effects; generalized additive models; generalized linear models; Gibbs point processes; GLIM; hard core process; inhomogeneous point process; marked point processes; Markov spatial point processes; Ord’s process; pairwise interaction; profile pseudolikelihood; spatial clustering; soft core process; spatial trend; S-PLUS; Strauss process; Widom–Rowlinson model. 1.

