#### DMCA

## On the Applicability of Neural Network and Machine Learning Methodologies to Natural Language Processing (1995)

### Cached

### Download Links

Venue: | Institute for Advanced Computer Studies, University of Maryland, College Park |

Citations: | 8 - 3 self |

### Citations

6460 | C4.5: Programs for machine learning - Quinlan - 1993 |

5724 |
Neural Networks: A comprehensive foundation
- Haykin
- 1998
(Show Context)
Citation Context ... recurrent network using the last two words as input to the model. 1. Weight initialization. Random weights are initialized with the goal of ensuring that the sigmoids do not start out in saturation (=-=Haykin 1994-=-). In addition, several sets of random weights are tested and the set which provides the best performance on the training data is chosen 9 . 2. Learning rate schedule. Relatively high learning rates a... |

2189 | Introduction to the theory of neural computation - Hertz, Krogh, et al. - 1991 |

2017 | Finding structure in time
- Elman
- 1990
(Show Context)
Citation Context ...= 0; 1; :::; L (layer), and y l k j k=0 = 1 (bias). 2 4. Williams and Zipser. A fully recurrent network as described in (Williams & Zipser 1989). 5. Elman. A simple recurrent network as described in (=-=Elman 1990-=-, Elman 1991). Initially, partial success was only obtained with models employing a large temporal input window. We were unable to train the networks using a small temporal window although it is theor... |

1752 | Information theory and statistics - Kullback - 1951 |

1348 | Lectures on Government and Binding - Chomsky - 1981 |

1061 |
C4.5: Programs for
- Quinlan
- 1993
(Show Context)
Citation Context ...nsider how you would define the cost for deleting a noun without knowing the context in which it appears. 5 Decision Tree Methods We tested the C4.5 decision tree induction algorithm by Ross Quinlan (=-=Quinlan 1993-=-). Decision tree methods construct a tree which partitions the data at each level in the tree based on a particular feature of the data. C4.5 only deals with strings of constant length and we used an ... |

767 | Identification and control of dynamical systems using neural networks - Narendra, Parthasarathy - 1990 |

595 | Stochastic complexity - Rissanen - 1987 |

524 | A learning algorithm for continually running fully recurrent neural networks - Williams, Zipser - 1989 |

497 | Three models for the description of language - Chomsky - 1956 |

393 | Distributed representations, simple recurrent networks, and grammatical structure
- Elman
- 1991
(Show Context)
Citation Context ...hen & Lee 1992). Do neural networks posses the power required for the task at hand? Yes, it has been shown that recurrent networks have the representational power required for hierarchical solutions (=-=Elman 1991-=-), and that they are Turing equivalent (Siegelmann & Sontag 1992). However, only recently has any work been successful with moderately large grammars. Recurrent neural networks have been used for seve... |

372 | Introduction to Formal Language Theory - Harrison - 1978 |

301 | Inside-outside reestimation from partially bracketed corpora
- Pereira, Schabes
- 1992
(Show Context)
Citation Context ...uage models have been based on finite-state descriptions such as n-grams or hidden Markov models. However, finite-state models cannot represent hierarchical structures as found in natural language 2 (=-=Pereira 1992-=-). In the past few years several recurrent neural network architectures have emerged which have been used for grammatical inference (Cleeremans, Servan-Schreiber & McClelland 1989, Giles, Sun, Chen, L... |

254 | Syntactic pattern recognition and applications - Fu - 1982 |

225 | Very fast simulated re-annealing
- Ingber
- 1989
(Show Context)
Citation Context ...same model as those models successfully trained to 100% correct training set classification using backpropagation through time. We have used the adaptive simulated annealing package by Lester Ingber (=-=Ingber 1989-=-, Ingber 1993). We have obtained no significant results from simulated annealing trials. Currently, the best simulated annealing trial has obtained an NMSE of 1.2 after two days of execution on a Sili... |

224 | The Induction of Dynamical Recognizers - Pollack - 1991 |

185 | Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks - Giles, Miller, et al. - 1992 |

172 | On the computational power of neural nets - Siegelmann, Sontag - 1995 |

166 | Finite state automata and simple recurrent networks - Cleeremans, Servan-Schreiber, et al. - 1989 |

156 | Tree-adjoining grammars: How much context sensitivity is required to providereasonable structuraldescriptions, inD.Dowty,L - Joshi - 1985 |

152 | Experiments in induction - Hunt, Marin, et al. - 1966 |

148 | Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity - Williams, Zipser - 1995 |

137 | An efficient gradient-based algorithm for on-line training of recurrent network trajectories - Williams, Peng - 1990 |

136 | Learning and applying contextual constraints in sentence comprehension - John, McClelland - 1990 |

130 | Paths and Categories - Pesetsky - 1982 |

114 |
An overview of sequence comparison
- Kruskal
- 1983
(Show Context)
Citation Context ...he distance between the two complete sequences. i and j range from 0 to the length of the respective sequences and the superscripts denote sequences of the corresponding length. For more details see (=-=Kruskal 1983-=-). d(a i ; b j ) = min 8 ? ? ! ? ? : d(a i\Gamma1 ; b j + w(a i ; 0) deletion ofa i d(a i\Gamma1 ; b j\Gamma1 ) + w(a i ; b j ) b j replacesa i d(a i ; b j\Gamma1 ) + w(0; b j ) insertion ofb j 6 Cons... |

98 | Computation at the onset of chaos - Crutchfield, Young |

86 | Induction of Finite-State Languages Using Second-Order Recurrent - Watrous, Kuhn - 1992 |

84 | FIR and IIR synapses, a new neural network architecture for time series modeling - Back, Tsoi - 1991 |

72 | Language and Nature
- Chomsky
- 1995
(Show Context)
Citation Context ...nguage is: How do people unfailingly manage to acquire such a complex rule system? A system so complex that it has resisted the efforts of linguists to date to adequately describe in a formal system (=-=Chomsky 1986-=-)? Here, we will provide a couple of examples of the kind of knowledge native speakers often take for granted. For instance, any native speaker of English knows that the adjective eager obligatorily t... |

62 | Accelerated learning in layered neural networks - Solla, Levin, et al. - 1988 |

61 | Language learning : cues or rules - MacWhinney, Leinbach, et al. - 1989 |

54 | Supervised learning of probability distributions by neural networks - Baum, Wilczek - 1988 |

53 | Discovering rules from large collections of examples: a case study - Quinlan - 1979 |

52 | Note on learning rate schedules for stochastic optimization - Darken, Moody - 1990 |

50 | Learning Finite State Machines with Self-Clustering Recurrent Networksâ€ť, Neural Computation - Zeng, Goodman, et al. - 1993 |

46 | Towards faster stochastic gradient search - Darken, Moody - 1992 |

45 | An experimental comparison of recurrent neural networks - Horne, Giles - 1995 |

44 | Extracting and Learning an Unknown Grammar with Recurrent Neural Networks - Giles, Miller, et al. - 1992 |

44 | Local feedback multilayered networks - Frasconi, Cori, et al. - 1992 |

41 | Learning algorithms and probability distributions in feedforward and feed-back - Hopfield - 1987 |

40 | Dynamic construction of finite-state automata from examples using hill-climbing - Tomita - 1982 |

38 | order recurrent networks and grammatical inference - GILES, SUN, et al. - 1990 |

35 | Unified integration of explicit rules and learning by example in recurrent networks - Frasconi, Gori, et al. - 1995 |

30 |
Structured representations and connectionist models
- Elman
- 1990
(Show Context)
Citation Context ...arge grammars. Recurrent neural networks have been used for several small natural language problems, e.g. papers using the Elman network for natural language tasks include: (Stolcke 1990, Allen 1983, =-=Elman 1984-=-, Harris & Elman 1984, John & McLelland 1990). 2 Data Our primary data consists of 552 English positive and negative examples taken from an introductory GB-linguistics textbook by Lasnik and Uriagerek... |

30 | A Course in GB Syntax: Lectures on Binding and Empty Categories - Lasnik, Uriagereka - 1988 |

29 | Adaptive Simulated Annealing (ASA
- Ingber
- 1995
(Show Context)
Citation Context ... those models successfully trained to 100% correct training set classification using backpropagation through time. We have used the adaptive simulated annealing package by Lester Ingber (Ingber 1989, =-=Ingber 1993-=-). We have obtained no significant results from simulated annealing trials. Currently, the best simulated annealing trial has obtained an NMSE of 1.2 after two days of execution on a Silicon Graphics ... |

28 | The role of similarity in Hungarian vowel harmony: A connectionist account - Hare - 1990 |

23 | Learning Feature-based Semantics with Simple Recurrent Networks
- Stolcke
- 1990
(Show Context)
Citation Context ...ccessful with moderately large grammars. Recurrent neural networks have been used for several small natural language problems, e.g. papers using the Elman network for natural language tasks include: (=-=Stolcke 1990-=-, Allen 1983, Elman 1984, Harris & Elman 1984, John & McLelland 1990). 2 Data Our primary data consists of 552 English positive and negative examples taken from an introductory GB-linguistics textbook... |

21 | Encoding input /output representations in connectionist cognitive systems - Miikkulainen, Dyer - 1989 |

20 | Induction of finite state languages using second-order recurrent networks - Watrous, Kuhn - 1992 |

18 | Finite State Automata and Simple Recurrent Recurrent Networks - Cleeremans, Servan-Scbreiber, et al. - 1989 |

16 | Generalized context-free grammars, head grammars, and natural language. Doctoral Dissertation - Pollard - 1984 |

14 | Second-order recurrent neural networks for grammatical inference - Giles, Chen, et al. - 1991 |

13 | Representing variable information with simple recurrent networks - Harris, Elman - 1989 |

13 | The complexity of language recognition by neural networks, in - Siegelmann, Sontag, et al. - 1992 |

11 |
Sequential connectionist networks for answering simple questions about a microworld
- Allen
- 1988
(Show Context)
Citation Context ...moderately large grammars. Recurrent neural networks have been used for several small natural language problems, e.g. papers using the Elman network for natural language tasks include: (Stolcke 1990, =-=Allen 1983-=-, Elman 1984, Harris & Elman 1984, John & McLelland 1990). 2 Data Our primary data consists of 552 English positive and negative examples taken from an introductory GB-linguistics textbook by Lasnik a... |

8 | Analysis of recurrent backpropagation - Simard, Ottaway, et al. - 1988 |

6 |
New Techniques for Nonlinear System Identification: A Rapprochement Between Neural Networks and Linear Systems
- Back
- 1992
(Show Context)
Citation Context ... l, N l is the number of neurons in layer l, w l ki is the weight connecting neuron k in layer l to neuron i in layer l \Gamma 1, y l 0 = 1 (bias), and f is commonly a sigmoid function. Definition 2 (=-=Back 1992-=-) An FIR MLP with L layers excluding the input layer (0; 1; :::; L), FIR filters of order n b , and N 0 ; N 1 ; :::; NL neurons per layer, is defined as: y l k (t) = f \Gamma x l k (t) \Delta (3) x l ... |

6 | A Course - Lasnik - 1988 |

6 | Networks that learn phonology - Gasser, Lee - 1989 |

6 | A connectionist perspective on prosodic structure - Hare, Corina, et al. - 1990 |

6 | Towards a connectionist phonology: The "many maps" approach to sequence maxfipulation - Touretzky - 1989 |

3 | Rules and maps in connectionist symbol processing (Technical Report CMU-CS-89-158 - Touretzky - 1989 |

1 | Towards a connectioninst phonology: The 'many maps' approach to sequence manipulation - Touretzky - 1989 |