Results 1 - 10
of
69
A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects
- Journal of Neurophysiology
, 2005
"... Multiple factors simultaneously affect the spiking activity of individual neurons. Determining the effects and relative importance of these factors is a challenging problem in neurophysiology. We propose a statistical framework based on the point process likelihood function to relate a neuron’s spik ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
Multiple factors simultaneously affect the spiking activity of individual neurons. Determining the effects and relative importance of these factors is a challenging problem in neurophysiology. We propose a statistical framework based on the point process likelihood function to relate a neuron’s spiking probability to three typical covariates: the neuron’s own spiking history, concurrent ensemble activity and extrinsic covariates such as stimuli or behavior. The framework uses parametric models of the conditional intensity function to define a neuron’s spiking probability in terms of the covariates. The discrete time likelihood function for point processes is used to carry out model fitting and model analysis. We show that, by modeling the logarithm of the conditional intensity function as a linear combination of functions of the covariates, the discrete time point process likelihood function is readily analyzed in the generalized linear model (GLM) framework. We illustrate our approach for both GLM and non-GLM likelihood functions using simulated data and multivariate single unit
Bayesian model averaging
- STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions tha ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of
How Many Genes Are Needed for a Discriminant Microarray Data Analysis
- Proc. Critical Assessment of Techniques for Microarray Data Mining Workshop
, 2000
"... The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are measured, not all are needed for discriminant analysis: a gene may either not contribute to the separation of two types of t ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are measured, not all are needed for discriminant analysis: a gene may either not contribute to the separation of two types of tissues/cancers, or it may be redundant because it is highly correlated with other genes. There are two theoretical frameworks in which variable selection (or gene selection in our case) can be addressed. The first is model selection, and the second is model averaging. We have carried out model selection using Akaike information criterion and Bayesian information criterion with logistic regression (discrimination, prediction, or classification) to determine the number of genes that provide the best model. These model selection criteria set upper limits of 22-25 and 12-13 genes for this data set with 38 samples, and the best model consists of only one (no.4847, zyxin) or two genes. We have also carried out model averaging over the best single-gene logistic predictors using three different weights: maximized likelihood, prediction rate on training set, and equal weight. We have observed that the performance of most of these weighted predictors on the testing set is gradually reduced as more genes are included, but a clear cutoff that separates good and bad prediction performance is not found. 1 Li Yang 2
Sequential Model Selection for Word Sense Disambiguation
, 1997
"... Statistical models of word-sense disam- biguation are often based on a small num- ber of contextual features or on a model that is assumed to characterize the inter- actions among a set of features. Model selection is presented as an alternative to these approaches, where a sequential search ..."
Abstract
-
Cited by 28 (13 self)
- Add to MetaCart
Statistical models of word-sense disam- biguation are often based on a small num- ber of contextual features or on a model that is assumed to characterize the inter- actions among a set of features. Model selection is presented as an alternative to these approaches, where a sequential search of possible models is conducted in order to find the model that best characterizes the interactions among features. This paper expands existing model selection methodology and presents the first comparative study of model selection search strategies and evaluation criteria when applied to the problem of building probabilistic classifiers for word-sense disambiguation.
Decomposable Modeling in Natural Language Processing
, 1999
"... In this paper, we describe a framework for developing probabilistic classifiers in natural language processing. Our focus is on formulating models that capture the most important interdependencies among features, to avoid overfitting the data while also characterizing the data well. The class of pro ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
In this paper, we describe a framework for developing probabilistic classifiers in natural language processing. Our focus is on formulating models that capture the most important interdependencies among features, to avoid overfitting the data while also characterizing the data well. The class of probability models and the associated inference techniques described here were developed in mathematical statistics, and are widely used in artificial intelligence and applied statistics. Our goal is to make this model selection framework accessible to researchers in NLP, and provide pointers to available software and important references. In addition, we describe how the quality of the three determinants of classifier performance (the features, the form of the model, and the parameter estimates) can be separately evaluated. We also demonstrate the classification performance of these models in a large-scale experiment involving the disambiguation of 34 words taken from the HECTOR word sense corpus (Hanks 1996). In 10-fold cross-validations, the model search procedure performs significantly better than naive Bayes on 6 of the words without being significantly worse on any of them
Physical network models
- J. Comp. Biol
, 2004
"... We develop a new framework for inferring models of transcriptional regulation. The models, which we call physical network models, are annotated molecular interaction graphs. The attributes in the model correspond to verifiable properties of the underlying biological system such as the existence of p ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We develop a new framework for inferring models of transcriptional regulation. The models, which we call physical network models, are annotated molecular interaction graphs. The attributes in the model correspond to verifiable properties of the underlying biological system such as the existence of protein–protein and protein–DNA interactions, the directionality of signal transduction in protein–protein interactions, as well as signs of the immediate effects of these interactions. Possible configurations of these variables are constrained by the available data sources. Some of the data sources, such as factor-binding data, involve measurements that are directly tied to the variables in the model. Other sources, such as gene knock-outs, are functional in nature and provide only indirect evidence about the variables. We associate each observed knock-out effect in the deletion mutant data with a set of causal paths (molecular cascades) that could in principle explain the effect, resulting in aggregate constraints about the physical variables in the model. The most likely settings of all the variables, specifying the most likely graph annotations, are found by a recursive application of the max-product algorithm. By testing our approach on datasets related to the pheromone response pathway in S. cerevisiae, we demonstrate that the resulting model is consistent with previous studies about the pathway. Moreover, we successfully predict gene knock-out effects with a high degree of accuracy in a cross-validation setting. When applying this approach genome-wide, we extract submodels consistent with previous studies. The approach can be readily extended to other data sources or to facilitate automated experimental design. Key words: physical network, gene regulation, data integration. 1.
MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis
- Journal of Classification
, 2002
"... Contents 1 Models 4 2 Obtaining and Installing MCLUST 5 2.1 Using MCLUST with S-PLUS 6 for UNIX/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Using MCLUST with S-PLUS 6 for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Hierarchical Clustering 6 4 EM for Mix ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Contents 1 Models 4 2 Obtaining and Installing MCLUST 5 2.1 Using MCLUST with S-PLUS 6 for UNIX/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Using MCLUST with S-PLUS 6 for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Hierarchical Clustering 6 4 EM for Mixture Models 8 4.1 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Individual E and M Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Bayesian Information Criterion 10 6 Cluster Analysis 11 6.1 Mclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2 EMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.3 Clustering with Noise and Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7 Simulation from Mixture Densities 19 8 Density Estimation 21 9 Displays
Unsupervised Mining of Statistical Temporal Structures
- VIDEO MINING, AZREIL ROSENFELD, DAVID DOERMANN, DANIEL DEMENTHON EDS
, 2003
"... In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semant ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semantics, particularly in structured domains like sports. While much work in the literature explores the link between the observations and the semantics using supervised learning, we propose unsupervised structure mining algorithms that aim at alleviating the burden of labelling and training, as well as providing a scalable solution for generalizing video indexing techniques to heterogeneous content collections such as surveillance and consumer videos. Existing unsupervised video structuring works primarily use clustering techniques, while the rich statistical characteristics in the temporal dimension at different granularity remain unexplored. Automatically identifying structures from an unknown domain poses significant challenges when domain knowledge is not explicitly present to assist algorithm design, model selection, and feature selection. In this work, we model multi-level statistical structures with hierarchical hidden Markov models based on a multi-level Markov dependency assumption. The parameters of the model are efficiently estimated using the EM algorithm, we have also developed a model structure learning algorithm that uses stochastic sampling techniques to find the optimal model structure, and a feature selection algorithm that automatically finds compact relevant feature sets using hybrid wrapper-filter methods. When tested on sports videos, the unsupervised learning scheme achieves very promising results: (1) The automatically selectead feature set...
Generalized Model Selection For Unsupervised Learning In High Dimensions
- Proceedings of Neural Information Processing Systems
, 1999
"... In this paper we describe an approach to model selection in unsupervised learning. This approach determines both the feature set and the number of clusters. To this end we first derive an objective function that explicitly incorporates this generalization. We then evaluate two schemes for model sele ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper we describe an approach to model selection in unsupervised learning. This approach determines both the feature set and the number of clusters. To this end we first derive an objective function that explicitly incorporates this generalization. We then evaluate two schemes for model selection - one using this objective function (a Bayesian estimation scheme that selects the best model structure using the marginal or integrated likelihood) and the second based on a technique using a cross-validated likelihood criterion. In the first scheme, for a particular application in document clustering, we derive a closed-form solution of the integrated likelihood by assuming an appropriate form of the likelihood function and prior. Extensive experiments are carried out to ascertain the validity of both approaches and all results are verified by comparison against ground truth. In our experiments the Bayesian scheme using our objective function gave better results tha n cross-validatio...

