Results 1 - 10
of
23
Identifying hierarchical structure in sequences: A linear-time algorithm
, 1997
"... SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offe ..."
Abstract
-
Cited by 131 (3 self)
- Add to MetaCart
SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method’s simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences. 1.
Software Agents: Completing Patterns and Constructing User Interfaces
- Journal of Artificial Intelligence Research
, 1993
"... To support the goal of allowing users to record and retrieve information, this paper describes an interactive note-taking system for pen-based computers with two distinctive features. First, it actively predicts what the user is going to write. Second, it automatically constructs a custom, button-bo ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
To support the goal of allowing users to record and retrieve information, this paper describes an interactive note-taking system for pen-based computers with two distinctive features. First, it actively predicts what the user is going to write. Second, it automatically constructs a custom, button-box user interface on request. The system is an example of a learning-apprentice software-agent. A machine learning component characterizes the syntax and semantics of the user's information. A performance system uses this learned information to generate completion strings and construct a user interface. 1. Introduction and Motivation People like to record information for later consultation. For many, the media of choice is paper. It is easy to use, inexpensive, and durable. To its disadvantage, paper records do not scale well. As the amount of information grows, retrieval becomes inefficient, physical storage becomes excessive, and duplication and distribution become expensive. Digital medi...
Learning to Understand Information on the Internet: An Example-Based Approach
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
"... The explosive growth of the Web has made intelligent software assistants increasingly necessary for ordinary computer users. Both traditional approaches -- search engines, hierarchical indices -- and intelligent software agents require significant amounts of human effort to keep up with the Web. As ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
The explosive growth of the Web has made intelligent software assistants increasingly necessary for ordinary computer users. Both traditional approaches -- search engines, hierarchical indices -- and intelligent software agents require significant amounts of human effort to keep up with the Web. As an alternative, we investigate the problem of automatically learning to interact with information sources on the Internet. We report on ShopBot and ILA, two implemented agents that learn to use such resources. ShopBot learns how to extract information from online vendors using only minimal knowledge about product domains. Given the home pages of several online stores, ShopBot autonomously learns how to shop at those vendors. After its learning is complete, ShopBot is able to speedily visit over a dozen software stores and CD vendors, extract product information, and summarize the results for the user. ILA learns to translate information from Internet sources into its own internal concept...
Adaptive Parsing: Self-Extending Natural Language Interfaces. Doctoral dissertation
, 1989
"... should not be interpreted as representing the official policies, either expressed or implied, ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
should not be interpreted as representing the official policies, either expressed or implied,
Acquisition of a Lexicon from Semantic Representations of
"... A system, WOLFIE, that acquires a map- ping of words to their semantic representation is presented and a preliminary evaluation is performed. Tree least general gener- alizations (TLGGs) of the representations of input sentences are performed to assist in determining the representations of ind ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
A system, WOLFIE, that acquires a map- ping of words to their semantic representation is presented and a preliminary evaluation is performed. Tree least general gener- alizations (TLGGs) of the representations of input sentences are performed to assist in determining the representations of indi- vidual words in the sentences. The best guess for a meaning of a word is the TLGG which overlaps with the highest percentage of sentence representations in which that word appears. Some promising experimen- tal results on a non-artificial data set are presented.
Computational aspects of resilient data extraction from semistructured sources
- In Proc. PODS
, 2000
"... Automatic data extraction from semistructured sources such as HTML pages is rapidly growing into a problem of significant importance, spurred by the growing popularity of the so called ”shopbots ” that enable end users to compare prices of goods and other services at various web sites without having ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Automatic data extraction from semistructured sources such as HTML pages is rapidly growing into a problem of significant importance, spurred by the growing popularity of the so called ”shopbots ” that enable end users to compare prices of goods and other services at various web sites without having to manually browse and fill out forms at each one of these sites. The main problem one has to contend with when designing data extraction techniques is that the contents of a web page changes frequently, either because its data is generated dynamically, in response to filling out a form, or because of changes to its presentation format. This makes the problem of data extraction particularly challenging, since a desirable requirement of any data extraction technique is that it be ”resilient”, i.e., using it we should always be able to locate the object of interest in a page (such as a form or an element in a table generated by a form fill-out) in spite of changes to the page’s content and layout. In this paper we propose a formal computation model for developing resilient data extraction techniques from semistructured sources. Specifically we formalize the problem of data extraction as one of generating unambiguous extraction expressions, which are regular expressions with some additional structure. The problem of resilience is then formalized as one of generating a maximal extraction expression of this kind. We present characterization theorems for maximal extraction expressions, complexity results for testing them, and algorithms for synthesizing them.
Semantic Lexicon Acquisition for Learning Natural Language Interfaces
- Department of Computer Sciences, University of Texas
, 1989
"... This paper describes a system, WOLIm (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper describes a system, WOLIm (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating WOLFIE'S ability to learn useful lexicons for a database interface in four different natural lan- guages. The lexicons learned by WOLFIE are compared to those acquired by a comparable system developed by Siskind (1996).
A Genetic Algorithm for the Induction of Nondeterministic Pushdown Automata
- University of Groningen
, 1995
"... This paper presents a genetic algorithm used to infer pushdown automata from legal and illegal examples of a language. It gives an introduction into grammatical inference, and discusses related work in grammatical inference using genetic algorithms. The paper describes the type of automaton that is ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper presents a genetic algorithm used to infer pushdown automata from legal and illegal examples of a language. It gives an introduction into grammatical inference, and discusses related work in grammatical inference using genetic algorithms. The paper describes the type of automaton that is used, the evaluation of the fitness of automata with respect to a set of examples of a language, the representation of automata in the genetic algorithm, and the genetic operators that work on this representation. Results are reported on the inference of a test suite of 10 languages. Pushdown automata for the language of correctly balanced and nested parentheses expressions, the language of sentences containing an equal number of a's and b's, the two-symbol palindromes, a set of regular languages, and a small natural language subset were inferred. Furthermore, some possible improvements and extensions of the algorithm are discussed. 1 Introduction Genetic algorithms, introduced by Holland (...
Improving Correctness of Finite-State Machine Synthesis from Multiple Partial Input/Output Sequences
- IN PROCEEDINGS OF THE 1ST NASA/DOD WORKSHOP OF EVOLVABLE HARDWARE
, 1999
"... Our previous work focused on the synthesis of sequential circuits based on a partial input/output sequence. As the behavioural description of the target circuit is not known the correctness of the result can not be verified. This paper proposes a method which increases the correctness percentage of ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Our previous work focused on the synthesis of sequential circuits based on a partial input/output sequence. As the behavioural description of the target circuit is not known the correctness of the result can not be verified. This paper proposes a method which increases the correctness percentage of the finite-state machine (FSM) synthesis using multiple partial input/output sequences. The synthesizer is based on Genetic Algorithm. The experimental results show that the correctness percentage can be increased to 100% by increasing of the number of input/output sequences.
On Polynomial-Time Learnability in the Limit of Strictly Deterministic Automata
, 1995
"... . This paper deals with the polynomial-time learnability of a language class in the limit from positive data, and discusses the learning problem of a subclass of deterministic finite automata (DFAs), called strictly deterministic automata (SDAs), in the framework of learning in the limit from positi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
. This paper deals with the polynomial-time learnability of a language class in the limit from positive data, and discusses the learning problem of a subclass of deterministic finite automata (DFAs), called strictly deterministic automata (SDAs), in the framework of learning in the limit from positive data. We first discuss the difficulty of Pitt's definition in the framework of learning in the limit from positive data, by showing that any class of languages with an infinite descending chain property is not polynomial-time learnable in the limit from positive data. We then propose new definitions for polynomial-time learnability in the limit from positive data. We show in our new definitions that the class of SDAs is iteratively, consistently polynomial-time learnable in the limit from positive data. In particular, we present a learning algorithm that learns any SDA M in the limit from positive data, satisfying the properties that (i) the time for updating a conjecture is at most O(`m)...

