8 citations found. Retrieving documents...
Della Pietra, S., Della Pietra, V., and Lafferty, J. Inducing features of random elds. Tech. Rep. CMU-CS-95-144, Carnegie Mellon University, 1995. 102

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Comparison of Word- and Sense-based Text.. - Kehagias.. (2001)   (3 citations)  (Correct)

....interchangeably. As with most pattern classi. cation tasks, the initial step of data preprocessing is crucial for the quality of the . nal results (see [11, 15, 26] In the case of text categorization, an essential aspect of preprocessing is the selection of appropriate document features [3, 6, 9, 17, 21, 38]. This is usually referred to as document representation. While a large number of document representations have been proposed, most of them use the same starting point, namely the words appearing in a document. In fact, a common choice is to represent a document as a bag of words [26, 30] i.e. ....

S.A. Della Pietra, V.J. Della Pietra, and J. Laferty, "Inducing features of random ...elds". IEEE Trans. on Pattern Anal. and Mach. Intel, vol. 19, 1997.


Learning from Labeled and Unlabeled Data using Graph Mincuts - Blum, Chawla (2001)   (21 citations)  (Correct)

....In the vision literature, the mincut approach and various extensions are motivated through a generative model known as a Markov Random Field. This model assumes the points (examples) are picked in advance, and then the labels are determined probabilistically according to a certain distribution (Pietra et al. 1997). This distribution is such that the probability of any given global labeling is the product of unary and pairwise terms: higher probability if nearby points are given the same labeling and lower probability when they have di erent labeling. When you take the log, you get the mincut objective ....

Pietra, S. D., Pietra, V. D., & La erty, J. (1997). Inducing features of random elds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380-393.


Stochastic Text Generation - Oberlander, Brew   (Correct)

....They may well use patterns of repetition or paraphrase to achieve this goal. While it may be dicult to give a precise account of the hidden causes of these regularities, there is well established technology for constructing probability distributions which respect particular surface regularities (Della Pietra et al. 1997). This is the maximum entropy modelling framework, in particular its instantiation as a method for learning conditional exponential models of the form P (Y jX) Berger et al. 1996, Della Pietra et al. 1997, Rosenfeld 1996) The next section begins with a brief high level description of this ....

.... for constructing probability distributions which respect particular surface regularities (Della Pietra et al. 1997) This is the maximum entropy modelling framework, in particular its instantiation as a method for learning conditional exponential models of the form P (Y jX) Berger et al. 1996, Della Pietra et al. 1997, Rosenfeld 1996) The next section begins with a brief high level description of this framework, after which we move to the key question which arises in applying this framework to NLG (or any other modelling problem) namely the speci cation of the (large) class of features which will be ....

[Article contains additional citation context not shown here]

Della Pietra, S., Della Pietra, V. & La erty, J. 1997. Inducing features of random elds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 1-13.


Parameter Estimation in Stochastic Logic Programs - Cussens (2000)   (12 citations)  (Correct)

....we have also missed out variable renaming in the input clauses. Table VII. Empirical distribution of labelled clauses Clause C1 C2 C3 C4 C5 C6 Empirical frequency 6 6 8 4 3 3 Predicate dependent relative frequency 1 2 1 2 2 3 1 3 1 2 1 2 submit.tex; 31 03 2000; 11:36; p. 19 20 As proven in (Della Pietra et al. 1997), MLE estimates for log linear parameters (for complete data) are given by where 8i : p[ i ] p [ i ] 8) and p is the distribution with maximum entropy which meets these constraints. Recall that, since we have unambiguous data, p de nes a distribution over refutations r. ....

....(iv) set up the equations 11 and (v) solve for . In real problems such an approach will be infeasible; there will be too many refutations to enumerate. An even more basic diculty is that we can not solve for i point wise. A feasible alternative is Improved Iterative Scaling (IIS) given in (Della Pietra et al. 1997), where instead of (attempting to) solve the equations given by Equation 8, we iteratively improve our parameter estimate. To explain this approach, we need to introduce some new notation. Let # (r) n X i i (r) 12) So # (r) is the total number of labelled clauses used in a refutation ....

[Article contains additional citation context not shown here]

Della Pietra, S., V. Della Pietra, and J. La erty: 1997, `Inducing features of random elds'. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 380-393.


Stochastic Text Generation - Oberlander, Brew (1999)   (Correct)

....This can be done by rst de ning a set of features which are sucient to capture the salient features of distribution, and then incorporating these features into maximum entropy models. Standard maximum entropy techniques for carrying out feature selection and weight estimation (Berger et al. 1996; Della Pietra et al. 1997; Rosenfeld 2000) are assumed. The merit of these techniques is exactly that they are standard, requiring from the user only the de nition of an appropriate space of possible features. TTR is calculated over 25 word bins. The TTR will be less than 1 only when the bin contains repeated words. ....

Della Pietra, S., Della Pietra, V. & La erty, J. 1997 Inducing features of random elds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 1-13.


Text Prediction for Translators - Foster (2002)   Self-citation (La)   (Correct)

....practice, q is usually chosen on the basis of eciency considerations (when the information it captures would be computationally expensive to represent as components of f ) and f is established using heuristics such as described in the next section. Once q and f have been chosen, the IIS algorithm [43] can be used to nd maximum likelihood parameter values. In the current context, since the aim was to compare equivalent linear and MEMD models, I used an interpolated trigram as the reference distribution and binary indicator functions over bilingual word pairs as features (ie, components of f ....

S. Della Pietra, V. Della Pietra, and J. La erty. Inducing features of random elds. Technical Report CMU-CS-95-144, CMU, 1995.


A Maximum Entropy Approach to Named Entity Recognition - Borthwick (1999)   (11 citations)  (Correct)

No context found.

Della Pietra, S., Della Pietra, V., and Lafferty, J. Inducing features of random elds. Tech. Rep. CMU-CS-95-144, Carnegie Mellon University, 1995. 102


Lexicalized Stochastic Modeling of Constraint-Based.. - Riezler, Prescher.. (2000)   (8 citations)  (Correct)

No context found.

Stephen Della Pietra, Vincent Della Pietra, and John Laoeerty. 1997. Inducing features of random elds. IEEE PAMI, 19(4):380393.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC