Results 1  10
of
346
Computing LTS Regression for Large Data Sets
 Institute of Mathematical Statistics Bulletin
, 1999
"... Least trimmed squares (LTS) regression is based on the subset of h cases (out of n) whose least squares t possesses the smallest sum of squared residuals. The coverage h may be set between n=2 andn. The LTS method was proposed by Rousseeuw (1984, p. 876) as a highly robust regression estimator, with ..."
Abstract

Cited by 93 (2 self)
 Add to MetaCart
Least trimmed squares (LTS) regression is based on the subset of h cases (out of n) whose least squares t possesses the smallest sum of squared residuals. The coverage h may be set between n=2 andn. The LTS method was proposed by Rousseeuw (1984, p. 876) as a highly robust regression estimator, with breakdown value (n; h)=n. It turned out that the computation time of existing LTS algorithms grew too fast with the size of the data set, precluding their use for data mining. Therefore we develop a new algorithm called FASTLTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call `selective iteration' and `nested extensions'. We also use an intercept adjustment technique to improve the precision. For small data sets FASTLTS typically nds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. Moreover, FASTLTS runs faster than all programs for least median of squares (LMS). The new algorithm makes the LTS method available as a tool for robust regression in large data sets, e.g. in a data mining context.
ROBPCA: a New Approach to Robust Principal Component Analysis
, 2003
"... In this paper we introduce a new method for robust principal component analysis. Classical PCA is based on the empirical covariance matrix of the data and hence it is highly sensitive to outlying observations. In the past, two robust approaches have been developed. The first is based on the eigenvec ..."
Abstract

Cited by 79 (14 self)
 Add to MetaCart
In this paper we introduce a new method for robust principal component analysis. Classical PCA is based on the empirical covariance matrix of the data and hence it is highly sensitive to outlying observations. In the past, two robust approaches have been developed. The first is based on the eigenvectors of a robust scatter matrix such as the MCD or an Sestimator, and is limited to relatively lowdimensional data. The second approach is based on projection pursuit and can handle highdimensional data. Here, we propose the ROBPCA approach which combines projection pursuit ideas with robust scatter matrix estimation. It yields more accurate estimates at noncontaminated data sets and more robust estimates at contaminated data. ROBPCA can be computed fast, and is able to detect exact fit situations. As a byproduct, ROBPCA produces a diagnostic plot which displays and classifies the outliers. The algorithm is applied to several data sets from chemometrics and engineering.
Principal Component Analysis based on Robust Estimators of the Covariance or Correlation Matrix: Influence Functions and Efficiencies
 BIOMETRIKA
, 2000
"... A robust principal component analysis can be easily performed by computing the eigenvalues and eigenvectors of a robust estimator of the covariance or correlation matrix. In this paper we derive the influence functions and the corresponding asymptotic variances for these robust estimators of eige ..."
Abstract

Cited by 78 (16 self)
 Add to MetaCart
A robust principal component analysis can be easily performed by computing the eigenvalues and eigenvectors of a robust estimator of the covariance or correlation matrix. In this paper we derive the influence functions and the corresponding asymptotic variances for these robust estimators of eigenvalues and eigenvectors. The behavior of several of these estimators is investigated by a simulation study. Finally, the use of empirical influence functions is illustrated by a real data example.
Outlierpreserving focus+context visualization in parallel coordinates
 IEEE Transactions on Visualization and Computer Graphics
, 2006
"... Figure 1: Outlierpreserving focus+context visualization of a CFD simulation dataset (the mixture of two fluids). The outlierpreserving context visualization shows that with respect to flow directions (the three axes on the left), most data items cluster around the zerovalues in the v and w compon ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
(Show Context)
Figure 1: Outlierpreserving focus+context visualization of a CFD simulation dataset (the mixture of two fluids). The outlierpreserving context visualization shows that with respect to flow directions (the three axes on the left), most data items cluster around the zerovalues in the v and w component of the flow vector, whereas quite different values of the u component show up. We can also see that a number of visualization outliers with respect to flow velocities, pressure values, temperatures, etc., significantly contribute to the visualization. Finally, the focus visualization (red polylines) reveals more multivariate details for a data subset which is characterized by low temperature and relatively low spatial y values. Focus+context visualization offers convenient solutions for specifically steering the investment of graphical resources in visualization so as to emphasize selected subsets of the data while at the same time also preserving a good overview through context visualization. In focus+context visualization, the context often is represented in a reduced and/or compressed form which can cause problems for smallscale features in the context such as data outliers. In this paper we present an approach to focus+context visualization in parallel coordinates which is truthful to outliers in the
Patterns of activity in the categorical representations of objects
 J. Cogn. Neurosci
, 2002
"... & Object perception has been a subject of extensive fMRI studies in recent years. Yet, the nature of the cortical representation of objects in the human brain remains controversial. Analyses of fMRI data have traditionally focused on the activation of individual voxels associated with presentati ..."
Abstract

Cited by 68 (1 self)
 Add to MetaCart
& Object perception has been a subject of extensive fMRI studies in recent years. Yet, the nature of the cortical representation of objects in the human brain remains controversial. Analyses of fMRI data have traditionally focused on the activation of individual voxels associated with presentation of various stimuli. The current analysis approaches functional imaging data as collective information about the stimulus. Linking activity in the brain to a stimulus is treated as a patternclassification problem. Linear discriminant analysis was used to reanalyze a set of data originally published by Ishai et al. (2000), available from fMRIDC (accession no. 220001113D). Results of the new analysis reveal that patterns of activity that distinguish one category of objects from other categories is largely independent of one another, both in terms of the activity and spatial overlap. The information used to detect objects from phasescrambled control stimuli is not essential in distinguishing one object category from another. Furthermore, performing an objectmatching task during scan significantly improved the ability to predict objects from controls, but had minimal effect on object classification, suggesting that the taskbased attentional benefit was nonspecific to object categories. &
Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator
 Journal of Multivariate Analysis
, 1998
"... The Minimum Covariance Determinant (MCD) scatter estimator is a highly robust estimator for the dispersion matrix of a multivariate, elliptically symmetric distribution. It is relatively fast to compute and intuitively appealing. In this note we derive its influence function and compute the asymptot ..."
Abstract

Cited by 56 (18 self)
 Add to MetaCart
The Minimum Covariance Determinant (MCD) scatter estimator is a highly robust estimator for the dispersion matrix of a multivariate, elliptically symmetric distribution. It is relatively fast to compute and intuitively appealing. In this note we derive its influence function and compute the asymptotic variances of its elements. A comparison with the one step reweighted MCD and with Sestimators is made. Also finitesample results are reported.
Fast and robust parameter estimation for statistical partial volume models in brain MRI
 NEUROIMAGE
, 2004
"... Due to the finite spatial resolution of imaging devices, a single voxel in a medical image may be composed of mixture of tissue types, an effect known as partial volume effect (PVE). Partial volume estimation, that is, the estimation of the amount of each tissue type within each voxel, has received ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
Due to the finite spatial resolution of imaging devices, a single voxel in a medical image may be composed of mixture of tissue types, an effect known as partial volume effect (PVE). Partial volume estimation, that is, the estimation of the amount of each tissue type within each voxel, has received considerable interest in recent years. Much of this work has been focused on the mixel model, a statistical model of PVE. We propose a novel trimmed minimum covariance determinant (TMCD) method for the estimation of the parameters of the mixel PVE model. In this method, each voxel is first labeled according to the most dominant tissue type. Voxels that are prone to PVE are removed from this labeled set, following which robust location estimators with high breakdown points are used to estimate the mean and the covariance of each tissue class. Comparisons between different methods for parameter estimation based on classified images as well as expectation–maximizationlike (EMlike) procedure for simultaneous parameter and
Robust factor analysis
, 2003
"... Our aim is to construct a factor analysis method that can resist the effect of outliers. For this we start with a highly robust initial covariance estimator, after which the factors can be obtained from maximum likelihood or from principal factor analysis (PFA). We find that PFA based on the minimum ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
Our aim is to construct a factor analysis method that can resist the effect of outliers. For this we start with a highly robust initial covariance estimator, after which the factors can be obtained from maximum likelihood or from principal factor analysis (PFA). We find that PFA based on the minimum covariance determinant scatter matrix works well. We also derive the influence function of the PFA method based on either the classical scatter matrix or a robust matrix. These results are applied to the construction of a new type of empirical influence function (EIF), which is very effective for detecting influential data. To facilitate the interpretation, we compute a cutoff value for this EIF. Our findings are illustrated with several real data examples.
Bayesian Statistics
 in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
(Show Context)
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the KullbackLeibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum KullbackLiebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum KullbackLeibler distance through bias reduction. This bias, which is inevitable in model
Fast and robust discriminant analysis
, 2003
"... Abstract The goal of discriminant analysis is to obtain rules that describe the separation between groups of observations. Moreover it allows to classify new observations into one of the known groups. In the classical approach discriminant rules are often based on the empirical mean and covariance ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
Abstract The goal of discriminant analysis is to obtain rules that describe the separation between groups of observations. Moreover it allows to classify new observations into one of the known groups. In the classical approach discriminant rules are often based on the empirical mean and covariance matrix of the data, or of parts of the data. But because these estimates are highly influenced by outlying observations, they become inappropriate at contaminated data sets. Robust discriminant rules are obtained by inserting robust estimates of location and scatter into generalized maximum likelihood rules at normal distributions. This approach allows to discriminate between several populations, with equal or unequal covariance structure, and with equal or unequal membership probabilities. In particular the highly robust MCD estimator is used as it can be computed very fast for large data sets. Also the probability of misclassification is estimated in a robust way. The performance of the new method is investigated through several simulations and by applying it to some real data sets.