Results 11  20
of
753
Improving word representations via global context and multiple word prototypes
 In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL
, 2012
"... Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and ..."
Abstract

Cited by 111 (14 self)
 Add to MetaCart
(Show Context)
Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models. 1 1
Trust region Newton method for largescale logistic regression
 In Proceedings of the 24th International Conference on Machine Learning (ICML
, 2007
"... Largescale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the loglikelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract

Cited by 96 (22 self)
 Add to MetaCart
(Show Context)
Largescale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the loglikelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1
Painless Unsupervised Learning with Features
"... We show how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods. In particular, each component multinomial of a generative model can be turned into a miniature logistic regression model if feature locality permits. The ..."
Abstract

Cited by 95 (3 self)
 Add to MetaCart
(Show Context)
We show how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods. In particular, each component multinomial of a generative model can be turned into a miniature logistic regression model if feature locality permits. The intuitive EM algorithm still applies, but with a gradientbased Mstep familiar from discriminative training of logistic regression models. We apply this technique to partofspeech induction, grammar induction, word alignment, and word segmentation, incorporating a few linguisticallymotivated features into the standard generative model for each task. These featureenhanced models each outperform their basic counterparts by a substantial margin, and even compete with and surpass more complex stateoftheart models. 1
Prototypedriven learning for sequence models
 In Proceedings of HLTNAACL
, 2006
"... We investigate prototypedriven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similari ..."
Abstract

Cited by 93 (7 self)
 Add to MetaCart
(Show Context)
We investigate prototypedriven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similarity features in a loglinear generative model. On partofspeech induction in English and Chinese, as well as an information extraction task, prototype features provide substantial error rate reductions over competitive baselines and outperform previous work. For example, we can achieve an English partofspeech tagging accuracy of 80.5 % using only three examples of each tag and no dictionary constraints. We also compare to semisupervised learning and discuss the system’s error trends. 1
Dynamical and microphysical retrieval from Doppler radar observations using a cloud model and its adjoint. Part I: Model development and simulated data experiments
 J. Atmos. Sci
, 1997
"... The purpose of the research reported in this paper is to develop a variational data analysis system that can be used to assimilate data from one or more Doppler radars. In the first part of this twopart study, the technique used in this analysis system is described and tested using data from a simu ..."
Abstract

Cited by 92 (8 self)
 Add to MetaCart
(Show Context)
The purpose of the research reported in this paper is to develop a variational data analysis system that can be used to assimilate data from one or more Doppler radars. In the first part of this twopart study, the technique used in this analysis system is described and tested using data from a simulated warm rain convective storm. The analysis system applies the 4D variational data assimilation technique to a cloudscale model with a warm rain parameterization scheme. The 3D wind, thermodynamical, and microphysical fields are determined by minimizing a cost function, defined by the difference between both radar observed radial velocities and reflectivities (or rainwater mixing ratio) and their model predictions. The adjoint of the numerical model is used to provide the sensitivity of the cost function with respect to the control variables. Experiments using data from a simulated convective storm demonstrated that the variational analysis system is able to retrieve the detailed structure of wind, thermodynamics, and microphysics using either dualDoppler or singleDoppler information. However, less accurate velocity fields are obtained when singleDoppler data were used. In both cases, retrieving the temperature field is more difficult than the retrieval of the other fields. Results also show that assimilating the rainwater mixing ratio obtained from the reflectivity data results in a better performance of the retrieval procedure than directly assimilating the reflectivity. It is also found that the system is robust to variations in the Z–qr relation, but the microphysical retrieval is quite sensitive to parameters in the warm rain scheme. The technique is robust to random errors in radial velocity and calibration errors in reflectivity. 1.
Improved partofspeech tagging for online conversational text with word clusters
 In Proceedings of NAACL
, 2013
"... We consider the problem of partofspeech tagging for informal, online conversational text. We systematically evaluate the use of largescale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves stateoftheart tagging results o ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of partofspeech tagging for informal, online conversational text. We systematically evaluate the use of largescale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves stateoftheart tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90 % to 93% accuracy (more than 3 % absolute). Qualitative analysis of these word clusters yields insights about NLP and linguistic phenomena in this genre. Additionally, we contribute the first POS annotation guidelines for such text and release a new dataset of English language tweets annotated using these guidelines. Tagging software, annotation guidelines, and largescale word clusters are available at:
LargeScale Optimization of Eigenvalues
 SIAM J. Optimization
, 1991
"... Optimization problems involving eigenvalues arise in many applications. Let x be a vector of real parameters and let A(x) be a continuously differentiable symmetric matrix function of x. We consider a particular problem which occurs frequently: the minimization of the maximum eigenvalue of A(x), ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
(Show Context)
Optimization problems involving eigenvalues arise in many applications. Let x be a vector of real parameters and let A(x) be a continuously differentiable symmetric matrix function of x. We consider a particular problem which occurs frequently: the minimization of the maximum eigenvalue of A(x), subject to linear constraints and bounds on x. The eigenvalues of A(x) are not differentiable at points x where they coalesce, so the optimization problem is said to be nonsmooth. Furthermore, it is typically the case that the optimization objective tends to make eigenvalues coalesce at a solution point. There are three main purposes of the paper. The first is to present a clear and selfcontained derivation of the Clarke generalized gradient of the max eigenvalue function in terms of a "dual matrix". The second purpose is to describe a new algorithm, based on the ideas of a previous paper by the author (SIAM J. Matrix Anal. Appl. 9 (1988) 256268), which is suitable for solving l...
Towards detecting influenza epidemics by analyzing Twitter messages
"... Rapid response to a health epidemic is critical to reduce loss of life. Existing methods mostly rely on expensive surveys of hospitals across the country, typically with lag times of one to two weeks for influenza reporting, and even longer for less common diseases. In response, there have been seve ..."
Abstract

Cited by 83 (5 self)
 Add to MetaCart
(Show Context)
Rapid response to a health epidemic is critical to reduce loss of life. Existing methods mostly rely on expensive surveys of hospitals across the country, typically with lag times of one to two weeks for influenza reporting, and even longer for less common diseases. In response, there have been several recently proposed solutions to estimate a population’s health from Internet activity, most notably Google’s Flu Trends service, which correlates search term frequency with influenza statistics reported by the Centers for Disease Control and Prevention (CDC). In this paper, we analyze messages posted on the microblogging site Twitter.com to determine if a similar correlation can be uncovered. We propose several methods to identify influenzarelated messages and compare a number of regression models to correlate these messages with CDC statistics. Using over 500,000 messages spanning 10 weeks, we find that our best model achieves a correlation of.78 with CDC statistics by leveraging a document classifier to identify relevant messages.
Geometric modeling in shape space
 In Proc. SIGGRAPH
, 2007
"... Figure 1: Geodesic interpolation and extrapolation. The blue input poses of the elephant are geodesically interpolated in an asisometricaspossible fashion (shown in green), and the resulting path is geodesically continued (shown in purple) to naturally extend the sequence. No semantic information, ..."
Abstract

Cited by 77 (10 self)
 Add to MetaCart
Figure 1: Geodesic interpolation and extrapolation. The blue input poses of the elephant are geodesically interpolated in an asisometricaspossible fashion (shown in green), and the resulting path is geodesically continued (shown in purple) to naturally extend the sequence. No semantic information, segmentation, or knowledge of articulated components is used. We present a novel framework to treat shapes in the setting of Riemannian geometry. Shapes – triangular meshes or more generally straight line graphs in Euclidean space – are treated as points in a shape space. We introduce useful Riemannian metrics in this space to aid the user in design and modeling tasks, especially to explore the space of (approximately) isometric deformations of a given shape. Much of the work relies on an efficient algorithm to compute geodesics in shape spaces; to this end, we present a multiresolution framework to solve the interpolation problem – which amounts to solving a boundary value problem – as well as the extrapolation problem – an initial value problem – in shape space. Based on these two operations, several classical concepts like parallel transport and the exponential map can be used in shape space to solve various geometric modeling and geometry processing tasks. Applications include shape morphing, shape deformation, deformation transfer, and intuitive shape exploration.
Conditional random fields for activity recognition
 In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007
, 2007
"... of any sponsoring institution, the U.S. government or any other entity. ..."
Abstract

Cited by 76 (0 self)
 Add to MetaCart
(Show Context)
of any sponsoring institution, the U.S. government or any other entity.