Results 1  10
of
374
A Maximum Entropy approach to Natural Language Processing
 COMPUTATIONAL LINGUISTICS
, 1996
"... The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we des ..."
Abstract

Cited by 1366 (5 self)
 Add to MetaCart
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we describe a method for statistical modeling based on maximum entropy. We present a maximumlikelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 670 (10 self)
 Add to MetaCart
(Show Context)
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
Logistic Regression, AdaBoost and Bregman Distances
, 2000
"... We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt al ..."
Abstract

Cited by 259 (45 self)
 Add to MetaCart
We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms can be divided into two types based on whether the parameters are iteratively updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms which interpolates smoothly between these two extremes. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliaryfunction proof technique. As one of our sequentialupdate algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show that all of our algorithms generalize easily to the multiclass case, and we contrast the new algorithms with iterative scaling. We conclude with a few experimental results with synthetic data that highlight the behavior of the old and newly proposed algorithms in different settings.
Maximum Entropy Models for Natural Language Ambiguity Resolution
, 1998
"... The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope th ..."
Abstract

Cited by 234 (1 self)
 Add to MetaCart
The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope that Ihave kept the good ideas in this thesis, and left the bad ideas out! Iwould like toacknowledge the following people for their contribution to my education: I thank my advisor Mitch Marcus, who gave me the intellectual freedom to pursue what I believed to be the best way to approach natural language processing, and also gave me direction when necessary. I also thank Mitch for many fascinating conversations, both personal and professional, over the last four years at Penn. I thank all of my thesis committee members: John La erty from Carnegie Mellon University, Aravind Joshi, Lyle Ungar, and Mark Liberman, for their extremely valuable suggestions and comments about my thesis research. I thank Mike Collins, Jason Eisner, and Dan Melamed, with whom I've had many stimulating and impromptu discussions in the LINC lab. Iowe them much gratitude for their valuable feedback onnumerous rough drafts of papers and thesis chapters.
Network tomography: recent developments
 Statistical Science
, 2004
"... Today's Int ernet is a massive, dist([/#][ net work which cont inuest o explode in size as ecommerce andrelatH actH]M/# grow. Thehet([H(/#]H( and largelyunregulatS stregula of t/ Int/HH3 renderstnde such as dynamicroutc/[ opt2]3fl/ service provision, service level verificatflH( and det(2][/ of ..."
Abstract

Cited by 132 (4 self)
 Add to MetaCart
(Show Context)
Today's Int ernet is a massive, dist([/#][ net work which cont inuest o explode in size as ecommerce andrelatH actH]M/# grow. Thehet([H(/#]H( and largelyunregulatS stregula of t/ Int/HH3 renderstnde such as dynamicroutc/[ opt2]3fl/ service provision, service level verificatflH( and det(2][/ of anomalous/malicious behaviorext/[(22 challenging. The problem is compounded bytS fact tct onecannot rely ont[ cooperatH2 of individual servers and routSS t aid intS collect[3 of net workt/[S measurement vits fort/]3 t/]3] In many ways, net workmonit]/#[ and inference problems bear a st[fl[ resemblancet otnc "inverse problems" in which key aspect of asystfl are not direct/ observable. Familiar signal processing orst[]23/#[S problems such ast omographic imagereconst[/#[S] and phylogenet# tog identn/HH2[M have int erest3/ connect[HU t tonn arising in net working. This artflMM int/ ducesnet workt/H3]S]/ y, a new field which we believe will benefit greatU from tm wealt of stH2](/#S( ttH2 andalgorit#S( It focuses especially on recent development s int2 field includingtl applicat[fl of pseudolikelihoodmetfl ds andt reeestfl3](/# formulat]M23 Keyw ords:Net workt/HflS33/ y, pseudolikelihood,t opology identn/]H22(/ tn est/]H tst 1 Introducti6 Nonet work is an island, ent/S ofitS[S] everynet work is a piece of an int/]SS work, a part of t/ main . Alt[]][ administHSHSS of smallscale net works can monit( localt ra#ccondit][/ and ident ify congest/# point s and performance botU((2/ ks, very few net works are complet/# # Rui Castroan Robert Nowak are with theDepartmen t of Electricalan ComputerEnterX Rice Unc ersity,Houston TX; Mark Coates is with the Departmen t of Electricalan ComputerEnterX McGill UnG ersity,Mon treal, Quebec,Can Gan Lian an Bin Yu are with theDepartmen t of Statistics,...
TreeBased Reparameterization Framework for Analysis of Belief Propagation and Related Algorithms
, 2001
"... We present a treebased reparameterization framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation or sumproduct algorithm [39, 36], as well as a rich set of variations and ext ..."
Abstract

Cited by 122 (20 self)
 Add to MetaCart
We present a treebased reparameterization framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation or sumproduct algorithm [39, 36], as well as a rich set of variations and extensions of belief propagation. Algorithms in this class can be formulated as a sequence of reparameterization updates, each of which entails refactorizing a portion of the distribution corresponding to an acyclic subgraph (i.e., a tree). The ultimate goal is to obtain an alternative but equivalent factorization using functions that represent (exact or approximate) marginal distributions on cliques of the graph. Our framework highlights an important property of BP and the entire class of reparameterization algorithms: the distribution on the full graph is not changed. The perspective of treebased updates gives rise to a simple and intuitive characterization of the fixed points in terms of tree consistency. We develop interpretations of these results in terms of information geometry. The invariance of the distribution, in conjunction with the fixed point characterization, enables us to derive an exact relation between the exact marginals on an arbitrary graph with cycles, and the approximations provided by belief propagation, and more broadly, any algorithm that minimizes the Bethe free energy. We also develop bounds on this approximation error, which illuminate the conditions that govern their accuracy. Finally, we show how the reparameterization perspective extends naturally to more structured approximations (e.g., Kikuchi and variants [52, 37]) that operate over higher order cliques.
Information Geometry of the EM and em Algorithms for Neural Networks
 Neural Networks
, 1995
"... In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the obs ..."
Abstract

Cited by 120 (9 self)
 Add to MetaCart
(Show Context)
In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the observed or specified inputoutput data based on the stochastic model. Two algorithms, the EM  and emalgorithms, have so far been proposed for this purpose. The EMalgorithm is an iterative statistical technique of using the conditional expectation, and the emalgorithm is a geometrical one given by information geometry. The emalgorithm minimizes iteratively the KullbackLeibler divergence in the manifold of neural networks. These two algorithms are equivalent in most cases. The present paper gives a unified information geometrical framework for studying stochastic models of neural networks, by forcussing on the EM and em algorithms, and proves a condition which guarantees their equ...
The structure of multineuron firing patterns in primate retina
 Petrusca D, Sher A, Litke AM & Chichilnisky EJ
, 2006
"... Synchronized firing among neurons has been proposed to constitute an elementary aspect of the neural code in sensory and motor systems. However, it remains unclear how synchronized firing affects the largescale patterns of activity and redundancy of visual signals in a complete population of neuron ..."
Abstract

Cited by 103 (8 self)
 Add to MetaCart
(Show Context)
Synchronized firing among neurons has been proposed to constitute an elementary aspect of the neural code in sensory and motor systems. However, it remains unclear how synchronized firing affects the largescale patterns of activity and redundancy of visual signals in a complete population of neurons. We recorded simultaneously from hundreds of retinal ganglion cells in primate retina, and examined synchronized firing in completely sampled populations of �50–100 ONparasol cells, which form a major projection to the magnocellular layers of the lateral geniculate nucleus. Synchronized firing in pairs of cells was a subset of a much larger pattern of activity that exhibited local, isotropic spatial properties. However, a simple model based solely on interactions between adjacent cells reproduced 99 % of the spatial structure and scale of synchronized firing. No more than 20 % of the variability in firing of an individual cell was predictable from the activity of its neighbors. These results held both for spontaneous firing and in the presence of independent visual modulation of the firing of each cell. In sum, largescale synchronized firing in the entire population of ONparasol cells appears to reflect simple neighbor interactions, rather than a unique visual signal or a highly redundant coding scheme.
Optimal investment in incomplete markets when wealth may become negative
 ANNALS OF APPLIED PROBABILITY
, 2001
"... This paper accompanies a previous one [KS99] by D. Kramkov and the present author. While in [KS99] we considered utility functions U: R + → R satisfying the Inada conditions U ′ (0) = ∞ and U ′ (∞) = 0, in the present paper we consider utility functions U: R → R which are finitely valued, for all ..."
Abstract

Cited by 87 (9 self)
 Add to MetaCart
(Show Context)
This paper accompanies a previous one [KS99] by D. Kramkov and the present author. While in [KS99] we considered utility functions U: R + → R satisfying the Inada conditions U ′ (0) = ∞ and U ′ (∞) = 0, in the present paper we consider utility functions U: R → R which are finitely valued, for all x ∈ R, and satisfy U ′ (−∞) = ∞ and U ′ (∞) = 0. A typical example of this situation is the exponential utility U(x) = −e −x. In the setting of [KS99] the following crucial condition on the asymptotic xU elasticity of U, as x tends to +∞, was isolated: lim supx→+ ∞ ′(x) < 1. This U(x) condition was found to be necessary and sufficient for the existence of the optimal investment as well as other key assertions of the related duality theory to hold true, if we allow for general semimartingales to model a (not necessarily complete) financial market. In the setting of the present paper this condition has to be accompanied by a similar condition on the asymptotic elasticity of U, as x tends to −∞,> 1. If both conditions are satisfied — we then say that the utility function U has reasonable asymptotic elasticity — we prove an existence theorem for the optimal investment in a general locally bounded semimartingale model of a financial market and for a utility function U: R → R, which is finitely valued on all of R; this theorem is parallel to the main result namely, lim infx→− ∞ xU ′(x) U(x) of [KS99]. We also give examples showing that the reasonable asymptotic elasticity of U also is a necessary condition for several key assertions of the theory to hold true.