Results 11 - 20
of
26
Invariant minimal Markov basis for sampling contingency tables with fixed marginals
, 2003
"... In this paper we define an invariant Markov basis for a connected Markov chain over the set of contingency tables with fixed marginals and derive some characterizations of minimality of the invariant basis. We also give a necessary and sufficient condition for uniqueness of invariant minimal Markov ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
In this paper we define an invariant Markov basis for a connected Markov chain over the set of contingency tables with fixed marginals and derive some characterizations of minimality of the invariant basis. We also give a necessary and sufficient condition for uniqueness of invariant minimal Markov basis. The invariance here refers to permutation of indices of each axis of the contingency tables. If the categories of each axis do not have any order relations among them, it is natural to consider the action of the symmetric group on each axis of the contingency table. A general algebraic algorithm for obtaining a Markov basis was given by Diaconis and Sturmfels (1998). Their algorithm is based on computing Gröbner basis of a well-specified polynomial ideal. However the reduced Gröbner basis depends on the particular term order and is not symmetric. Therefore it is of interest to consider properties of invariant Markov basis. We study minimality of invariant Markov basis using techniques of Takemura and Aoki (2003).
A Simulation-Intensive Approach for Checking Hierarchical Models
- Test
, 1998
"... Recent computational advances have made it feasible to fit hierarchical models in a wide range of serious applications. If one entertains a collection of such models for a given data set, the problems of model adequacy and model choice arise. We focus on the former. While model checking usually addr ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Recent computational advances have made it feasible to fit hierarchical models in a wide range of serious applications. If one entertains a collection of such models for a given data set, the problems of model adequacy and model choice arise. We focus on the former. While model checking usually addresses the entire model specification, model failures can occur at each hierarchical stage. Such failures include outliers, mean structure errors, dispersion misspecification, and inappropriate exchangeabilities. We propose another approach which is entirely simulation based. It only requires the model specification and that, for a given data set, one be able to simulate draws from the posterior under the model. By replicating a posterior of interest using data obtained under the model we can "see" the extent of variability in such a posterior. Then, we can compare the posterior obtained under the observed data with this medley of posterior replicates to ascertain whether the former is in agr...
Randomization Techniques for Graphs
"... Mining graph data is an active research area. Several data mining methods and algorithms have been proposed to identify structures from graphs; still, the evaluation of those results is lacking. Within the framework of statistical hypothesis testing, we focus in this paper on randomization technique ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Mining graph data is an active research area. Several data mining methods and algorithms have been proposed to identify structures from graphs; still, the evaluation of those results is lacking. Within the framework of statistical hypothesis testing, we focus in this paper on randomization techniques for unweighted undirected graphs. Randomization is an important approach to assess the statistical significance of data mining results. Given an input graph, our randomization method will sample data from the class of graphs that share certain structural properties with the input graph. Here we describe three alternative algorithms based on local edge swapping and Metropolis sampling. We test our framework with various graph data sets and mining algorithms for two applications, namely graph clustering and frequent subgraph mining. 1
Assessing the Order of Dependence for Partially Exchangeable Binary Data
, 1998
"... The problem we consider is how to assess the order of serial dependence within partially exchangeable binary sequences. We obtain exact conditional tests comparing any two orders by finding the conditional distribution of data given certain transition counts. These tests are facilitated with a new M ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The problem we consider is how to assess the order of serial dependence within partially exchangeable binary sequences. We obtain exact conditional tests comparing any two orders by finding the conditional distribution of data given certain transition counts. These tests are facilitated with a new Monte Carlo scheme. Asymptotic tests are also discussed. In particular, we show that the likelihood ratio tests have an asymptotic Ø 2 distribution, thus generalizing the results of Billingsley (1961) for the particular case of Markov chains. We apply these methods to several data sets, and perform a simulation to study their properties. Keywords: conditional simulation, Markov chains, model selection, nonparametric mixtures, multiple binary sequences. 1 INTRODUCTION This paper is concerned with the nonparametric statistical analysis of multiple binary sequences, a commonly occurring data structure. One example we consider comes from dairy science, where each of a number of cows is tested...
Randomization of real-valued matrices for assessing the significance of data mining results
"... ..."
Tell Me Something I Don’t Know: Randomization Strategies for Iterative Data Mining
"... There is a wide variety of data mining methods available, and it is generally useful in exploratory data analysis to use many different methods for the same dataset. This, however, leads to the problem of whether the results found by one method are a reflection of the phenomenon shown by the results ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
There is a wide variety of data mining methods available, and it is generally useful in exploratory data analysis to use many different methods for the same dataset. This, however, leads to the problem of whether the results found by one method are a reflection of the phenomenon shown by the results of another method, or whether the results depict in some sense unrelated properties of the data. For example, using clustering can give indication of a clear cluster structure, and computing correlations between variables can show that there are many significant correlations in the data. However, it can be the case that the correlations are actually determined by the cluster structure. In this paper, we consider the problem of randomizing data so that previously discovered patterns or models are taken into account. The randomization methods can be used in iterative data mining. At each step in the data mining process, the randomization produces random samples from the set of data matrices satisfying the already discovered patterns or models. That is, given a data set and some statistics (e.g., cluster centers or co-occurrence counts) of the data, the randomization methods sample data sets having similar values of the given statistics as the original data set. We use Metropolis sampling based on local swaps to achieve this. We describe experiments on real data that demonstrate the usefulness of our approach. Our results indicate that in many cases, the results of, e.g., clustering actually imply the results of, say, frequent pattern discovery.
Markov Chains, Quotient Ideals, and Connectivity with Positive Margins Edited by
"... We present algebraic methods for studying connectivity of Markov moves with margin positivity. The purpose is to develop Markov sampling methods for exact conditional inference in statistical models where a Markov basis is hard to compute. In some cases positive margins are shown to allow a set of M ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present algebraic methods for studying connectivity of Markov moves with margin positivity. The purpose is to develop Markov sampling methods for exact conditional inference in statistical models where a Markov basis is hard to compute. In some cases positive margins are shown to allow a set of Markov connecting moves that are much simpler than the full Markov basis. Advances in algebra have impacted in a fundamental way the study of exponential families of probability distributions. In the 1990s, computational methods of commutative algebra were brought into statistics to solve both classical and new problems in the framework of exponential family models.
Connecting Tables with Zero-One Entries by a Subset of a Markov Basis
- MATHEMATICAL ENGINEERING
, 2009
"... ..."
Permutation Models for Relational Data
, 2005
"... We here propose an exponential family of permutation models that is suitable for inferring the direction and strength of association among dyadic relational structures. A linear-time algorithm is shown for MCMC simulation of model draws, as is the use of simulated draws for maximum likelihood es ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We here propose an exponential family of permutation models that is suitable for inferring the direction and strength of association among dyadic relational structures. A linear-time algorithm is shown for MCMC simulation of model draws, as is the use of simulated draws for maximum likelihood estimation (MCMC-MLE) and/or estimation of Monte Carlo standard errors. We also provide an easily performed maximum pseudo-likelihood estimation procedure for the permutation model family, which provides a reasonable means of generating seed models for the MCMC-MLE procedure. Use of the modeling framework is demonstrated via an application involving relationships among managers in a high-tech firm.
Evaluating Query Result Significance in Databases via Randomizations
"... Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using relational queries. As an example, in the Internet Movie Database (IMDb) a query can be used to check whether the average ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using relational queries. As an example, in the Internet Movie Database (IMDb) a query can be used to check whether the average rank of action movies is higher than the average rank of drama movies. We consider the problem of assessing whether the results returned by such a query are statistically significant or just a random artifact of the structure in the data. Our approach is based on randomizing the tables occurring in the queries and repeating the original query on the randomized tables. It turns out that there is no unique way of randomizing in multi-relational data. We propose several randomization techniques, study their properties, and show how to find out which queries or hypotheses about our data result in statistically significant information and which tables in the database convey most of the structure in the query. We give results on real and generated data and show how the significance of some queries vary between different randomizations. 1

