Results 1  10
of
145
Diagnostics for multivariate imputations
 J R Stat Soc: Ser C: Appl Stat
"... Abstract We consider three sorts of diagnostics for random imputations: (a) displays of the completed data, intended to reveal unusual patterns that might suggest problems with the imputations, (b) comparisons of the distributions of observed and imputed data values, and (c) checks of the fit of ob ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
Abstract We consider three sorts of diagnostics for random imputations: (a) displays of the completed data, intended to reveal unusual patterns that might suggest problems with the imputations, (b) comparisons of the distributions of observed and imputed data values, and (c) checks of the fit of observed data to the model used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation
Variable Selection for MultiplyImputed Data with Application to Dioxin Exposure Study
"... SUMMARY: Multiple imputation (MI) is a commonly used technique for handling missing data in largescale medical and public health studies. An important and longstanding statistical problem is how to conduct variable selection on the multiplyimputed data. If a variable selection method is applied to ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
SUMMARY: Multiple imputation (MI) is a commonly used technique for handling missing data in largescale medical and public health studies. An important and longstanding statistical problem is how to conduct variable selection on the multiplyimputed data. If a variable selection method is applied to each imputed dataset separately, it may select different variables in different imputed datasets and make it difficult to interpret the final model or draw the scientific conclusions. In this paper, we propose two novel variable selection methods for the multiplyimputed data. Both methods jointly fit models on multiple imputed datasets and yield a consistent selection of variables across all imputed datasets. The first method is called MIstepwise method, which is an extension of the stepwise selection method. It is implemented by first obtaining combined pvalues using Rubin’s rules for MI inference (Rubin 1987; Little and Rubin 2002) and then selecting variables base on the combined pvalues in each step of selection. The second method is called MIlasso method, which is an extension of the lasso method (Tibshrani, 1996). It treats the estimated regression coefficients of the same variable across all imputed datasets as a group, and applies the group lasso penalty (Yuan and Lin 2006) to yield a consistent variable selection across multiple imputed datasets. The proposed methods are demonstrated using simulation studies. We also apply the two methods to the University of Michigan Dioxin Exposure Study (UMDES) to identify important environmental exposure factors that are associated with serum dioxin concentrations. KEY WORDS: exposure assessment; group lasso penalty; regularization; Rubin’s rules; stepwise selection; variable selection. 1
ON THE STATIONARY DISTRIBUTION OF ITERATIVE IMPUTATIONS
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2010
"... Iterative imputation, in which variables are imputed one at a time each given a model predicting from all the others, is a popular technique that can be convenient and flexible, as it replaces a potentially difficult multivariate modeling problem with relatively simple univariate regressions. In thi ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Iterative imputation, in which variables are imputed one at a time each given a model predicting from all the others, is a popular technique that can be convenient and flexible, as it replaces a potentially difficult multivariate modeling problem with relatively simple univariate regressions. In this paper, we begin to characterize the stationary distributions of iterative imputations and their statistical properties. More precisely, when the conditional models are compatible (defined in the text), we give a set of sufficient conditions under which the imputation distribution converges in total variation to the posterior distribution of a Bayesian model. When the conditional models are incompatible but are valid, we show that the combined imputation estimator is consistent.
Improving the mapping of conditionspecific healthrelated quality of life onto SF6D score
 Quality of Life Research
, 2014
"... SF6D score ..."
(Show Context)
CALIBRATED IMPUTATION OF NUMERICAL DATA UNDER LINEAR EDIT RESTRICTIONS
"... Abstract: A common problem faced by statistical institutes is that data may be missing from collected datasets. The typical way to overcome this problem is to impute the missing data. The problem of imputing missing data is complicated by the fact that statistical data often have to satisfy certain ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract: A common problem faced by statistical institutes is that data may be missing from collected datasets. The typical way to overcome this problem is to impute the missing data. The problem of imputing missing data is complicated by the fact that statistical data often have to satisfy certain edit rules and that values of variables across units sometimes have to sum up to known totals. For numerical data, edit rules are most often formulated as linear restrictions on the variables. For example, for data on enterprises edit rules could be that the profit and costs of an enterprise should sum up to its turnover and that the turnover should be at least zero. The totals of some variables across units may already be known from administrative data (e.g. turnover from a tax register) or estimated from other sources. Standard imputation methods for numerical data as described in the literature generally do not take such edit rules and totals into account. In this article we describe algorithms for imputing missing numerical data that take edit restrictions into account and ensure that sums are calibrated to known totals. These algorithms are based on a sequential regression approach that uses regression predictions to impute the variables one by one. To assess the performance of the imputation methods a simulation study is carried out as well as an evaluation study based on a real dataset.
Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches.” Political Analysis Forthcoming
, 2014
"... We consider the relative performance of two common approaches to multiple imputation (MI): joint multivariate normal (MVN) MI, in which the data are modeled as a sample from a joint MVN distribution; and conditional MI, in which each variable is modeled conditionally on all the others. In order to ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider the relative performance of two common approaches to multiple imputation (MI): joint multivariate normal (MVN) MI, in which the data are modeled as a sample from a joint MVN distribution; and conditional MI, in which each variable is modeled conditionally on all the others. In order to use the multivariate normal distribution, implementations of joint MVN MI typically assume that categories of discrete variables are probabilistically constructed from continuous values. We use simulations to examine the implications of these assumptions. For each approach, we assess (1) the accuracy of the imputed values; and (2) the accuracy of coefficients and fitted values from a model fit to completed data sets. These simulations consider continuous, binary, ordinal, and unorderedcategorical variables. One set of simulations uses multivariate normal data, and one set uses data from the 2008 American National Election Studies. We implement a less restrictive approach than is typical when evaluating methods using simulations in the missing data literature: in each case, missing values are generated by carefully following the conditions necessary for missingness to be “missing at random ” (MAR). We find that in these situations conditional MI is more accurate than joint MVN MI whenever the data include categorical variables. 1
Bayesian model averaging for propensity score analysis. Multivariate Behavioral Research 49(6): 505–517
, 2014
"... This article considers Bayesian model averaging as a means of addressing uncertainty in the selection of variables in the propensity score equation. We investigate an approximate Bayesian model averaging approach based on the modelaveraged propensity score estimates produced by the R package BMA b ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This article considers Bayesian model averaging as a means of addressing uncertainty in the selection of variables in the propensity score equation. We investigate an approximate Bayesian model averaging approach based on the modelaveraged propensity score estimates produced by the R package BMA but that ignores uncertainty in the propensity score. We also provide a fully Bayesian model averaging approach via Markov chain Monte Carlo sampling (MCMC) to account for uncertainty in both parameters and models. A detailed study of our approach examines the differences in the causal estimate when incorporating noninformative versus informative priors in the model averaging stage. We examine these approaches under common methods of propensity score implementation. In addition, we evaluate the impact of changing the size of Occam's window used to narrow down the range of possible models. We also assess the predictive performance of both Bayesian model averaging propensity score approaches and compare it with the case without Bayesian model averaging. Overall, results show that both Bayesian model averaging propensity score approaches recover the treatment effect estimates well and generally provide larger uncertainty estimates, as expected. Both Bayesian model averaging approaches offer slightly better prediction of the propensity score compared with the Bayesian approach with a single propensity score equation. Covariate balance checks for the case study show that both Bayesian model averaging approaches offer good balance. The fully Bayesian model averaging approach also provides posterior probability intervals of the balance indices. The distinctive feature that separates Bayesian statistical inference from its frequentist counterpart is its focus on describing and modeling all forms of uncertainty. The primary focus of uncertainty within Bayesian inference concerns prior knowledge about model parameters. In the Bayesian framework, all unknown parameters are assumed to be random, described by probability distributions. Bayesian inference encodes background knowledge about the unknown parameters in the form of the prior distribution Within the Bayesian framework, parameters are not the only unknown elements. In fact, the Bayesian framework recognizes that models themselves possess uncertainty insofar as a particular model is typically chosen based on prior knowledge of the problem at hand and the variables that have been used in previously specified models. This form of un Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overconfident inferences and decisions that are more risky than one thinks they are. (p. 382)
Covariate Balance in Bayesian Propensity Score Approaches for Observational Studies
, 2015
"... Abstract: Bayesian alternatives to frequentist propensity score approaches have recently been proposed. However, few studies have investigated their covariate balancing properties. This article compares a recently developed twostep Bayesian propensity score approach to the frequentist approach wit ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract: Bayesian alternatives to frequentist propensity score approaches have recently been proposed. However, few studies have investigated their covariate balancing properties. This article compares a recently developed twostep Bayesian propensity score approach to the frequentist approach with respect to covariate balance. The effects of different priors on covariate balance are evaluated and the differences between frequentist and Bayesian covariate balance are discussed. Results of the case study reveal that both the Bayesian and frequentist propensity score approaches achieve good covariate balance. The frequentist propensity score approach performs slightly better on covariate balance for stratification and weighting methods, whereas the twostep Bayesian approach offers slightly better covariate balance in the optimal full matching method. Results of a comprehensive simulation study reveal that accuracy and precision of prior information on propensity score model parameters do not greatly influence balance performance. Results of the simulation study also show that overall, the optimal full matching method provides the best covariate balance and treatment effect estimates compared to the stratification and weighting methods. A unique feature of covariate balance within Bayesian propensity score analysis is that we can obtain a distribution of balance indices in addition to the point estimates so that the variation in balance indices can be naturally captured to assist in covariate balance checking.