| Robert C. Miller and Brad A. Myers. Lightweight structured text processing. In Proceedings of the 1999 USENIX Annual Technical Conference, pages 131--144, June 1999. |
.... unrestricted Uniform Resource Locator (URL) using the File Transfer Protocol (FTP) or the Hypertext Transfer Protocol (HTTP) HTML pages can either be displayed as a text document showing the complete source code, or the page can be presented as a standard web page, as in every regular web browser [16]. LAPIS is equipped with the pattern recognition language text constraints , which allows the user to precisely specify and restrict the required data in a given document. In comparison with the often cryptic and difficult to read regular expressions, this scripting language has a syntax that is ....
....this scripting language has a syntax that is extremely intuitive and easy to understand. With the aid of additional features incorporated within the program, it is possible to manipulate previously highlighted regions within a document. The data can be deleted, filtered, sorted or extracted [16]. There are several possibilities for defining relevant data in a document. The user can manually apply the text constraints language to define a specific pattern for all the required data, or the user can highlight only a few relevant text regions and LAPIS will try to automatically generate a ....
Miller, R. and Myers, B.: Lightweight Structured Text Processing, in: Proceedings of 1999 USENIX Annual Technical Conference, Monterey, CA, pp. 131-144, June 1999
....give a better insight into one of several possible approaches for generating wrappers. This toolkit was chosen as an example, because, at the time this research began, it was the only non commercial, open source toolkit available that was still being developed and supported. As Miller explains in [15], the syntax of regular expressions and grammars for automatic text processing is difficult to read and understand. Therefore, these methods have not gained general acceptance with the majority of normal users. For this reason, it became necessary to develop programs that provided intuitive ....
Miller, R.: Lightweight Structured Text Processing, PhD Thesis Proposal, Computer Science Department, Camegie Mellon University, USA, April 1999, http://www-2.cs.cmu. edu/ -rcm/papers/proposal/proposal.html (Oct. 2002)
.... unrestricted Uniform Resource Locator (URL) using the File Transfer Protocol (FTP) or the Hypertext Transfer Protocol (HTTP) HTML pages can either be displayed as a text document showing the complete source code, or the page can be presented as a standard web page, as in every regular web browser [16]. LAPIS is equipped with the pattern recognition language text constraints , which allows the user to precisely specify and restrict the required data in a given document. In comparison with the often cryptic and difficult to read regular expressions, this scripting language has a syntax that is ....
....this scripting language has a syntax that is extremely intuitive and easy to understand. With the aid of additional features incorporated within the program, it is possible to manipulate previously highlighted regions within a document. The data can be deleted, filtered, sorted or extracted [16]. There are several possibilities for defining relevant data in a document. The user can manually apply the text constraints language to define a specific pattern for all the required data, or the user can highlight only a few relevant text regions and LAPIS will try to automatically generate a ....
Miller, R. and Myers, B.: Lightweight Structured Text Processing, in: Proceedings of 1999 USENIX Annual Technical Conference, Monterey, CA, pp. 131--144, June 1999
....give a better insight into one of several possible approaches for generating wrappers. This toolkit was chosen as an example, because, at the time this research began, it was the only non commercial, open source toolkit available that was still being developed and supported. As Miller explains in [15], the syntax of regular expressions and grammars for automatic text processing is difficult to read and understand. Therefore, these methods have not gained general acceptance with the majority of normal users. For this reason, it became necessary to develop programs that provided intuitive ....
Miller, R.: Lightweight Structured Text Processing, PhD Thesis Proposal, Computer Science Department, Carnegie Mellon University, USA, April 1999, http://www-2.cs.cmu.edu/ ~rcm/papers/proposal/proposal.html (Aug. 2002)
....selections for text editing without inference. Then we discuss two techniques for inferring multiple selections: selection guessing and simultaneous editing. Multiple Selections LAPIS (Lightweight Architecture for Processing Information Structure) is based on the idea of lightweight structure [6], a library of patterns and parsers that detect structure in text. The LAPIS library includes parsers for HTML, Java, document structure (words, sentences, lines, paragraphs) and various codes (URLs, email addresses, phone numbers, ZIP codes, etc) The library can be easily extended by users with ....
....These small changes presented no difficulties for the users in our user studies, who were able to understand and use the highlighting without any explicit instruction. Another way to make a multiple selection is patternmatching. LAPIS has a novel pattern language called text constraints [6], which is designed for combining library concepts with operators like before, after, in, and contains. Examples of patterns include in PhoneNumber, Link containing My Yahoo , last Word in Sentence, and Method containing MethodName= toString . The capitalized words in these patterns are ....
[Article contains additional citation context not shown here]
R.C. Miller and B.A. Myers. Lightweight structured text processing. In Proc. USENIX Tech. Conf., pp 131--144, June 1999.
....Java syntax. Our solution to this problem is a knowledge base, represented by a library of patterns and parsers that detect structure in text. Users can extend the library on the fly by specifying new patterns, which can be either regular expressions or high level patterns called text constraints [9]. Generalization should be able to guess accurately from only one example. When multiple generalizations are consistent with the user s selection, the generalizer must make its best guess, which hopefully will often be the description the user intended. Generalization should be correctable. ....
....of simultaneous editing implemented in our prototype system. Features of the user interface will be introduced by presenting an example of the system in operation. Our implementation of simultaneous editing is built into LAPIS, a text processing system which has been described previously [9][10] LAPIS has several unusual features that make it well suited to this effort. First, LAPIS supports multiple simultaneous text selections; most text editors allow only one contiguous selection. Multiple selections make it easy to display the corresponding selection in every record. Second, ....
[Article contains additional citation context not shown here]
R. C. Miller and B. A. Myers. Lightweight structured text processing. In Proceedings of the
....5 of 24 manual editing sessions ended with uncorrected errors. If the two most common selection errors had been noticed by users, the error rate for simultaneous editing would have dropped to only 2 of 24. We are currently studying ways to call the user s attention to possible selection errors [8]. After doing the tasks, users were asked to evaluate the system s ease of use, trustworthiness, and usefulness on a 5 point Likert scale. These questions were also borrowed from Fujishima [3] The results, shown in Figure 7, are generally positive. 7 Status and Future Work Simultaneous editing ....
....in an otherwise regular highlight can be noticed at a glance. Another is an abbreviated context view, showing only the selected lines from each record. A third view is an unusual matches view, showing only the most unusual examples of the generalization, found by clustering the matches [8]. A third problem with large data sets is where the data resides. For interactive simultaneous editing, the data must fit in RAM, with some additional overhead for parsing and storing feature lists. For large data sets, this is impractical. However, it is easy to imagine interactively editing a ....
R. C. Miller. Lightweight Structured Text Processing. PhD thesis, Carnegie Mellon University, 2001. In preparation.
....all selections simultaneously. string. Some editors also support regular expressions, but the learning curve can be intimidating, and some patterns are impossible to express with low level regular expressions. LAPIS has a new pattern language called text constraints that addresses these problems [4]. Text constraints combine literal string searches, lightweight structure from the library, and relations like before, after, in, and contains, to produce simple but powerful patterns: in PhoneNumber Link containing My Yahoo Java.Method containing Java.MethodName= toString ....
.... containing Java.MethodName= toString last Word in Sentence In LAPIS, text constraints are used not only for search andreplace, but also to define new structure for the structure library, to automate web browsing, and to apply Unix style tools like grep and sort to structured text [4]. SELECTION GUESSING Any language has a learning curve, and text constraints is no exception. LAPIS can smooth out the curve by inferring a pattern from positive and negative examples provided by the user. The user gives examples by entering selection guessing mode. Multiple examples are given ....
R.C. Miller and B.A. Myers. Lightweight structured text processing. In Proc. USENIX Tech. Conf., pp 131--144, June 1999.
....feature vector space. Matches which lie far from the median are considered outliers. We have implemented an outlier finder as part of the LAPIS system (Lightweight Architecture for Processing Information Structure) a text editor web browser designed for browsing and editing semi structured text [7]. Outlier finding depends on two assumptions. First, most matches must be correct, so that errors are the needles in the haystack, not the hay. This assumption is essential because the outlier finder has no way of knowing what the user actually intends the pattern to match. Unless the set of ....
....between matches. Since most applications of pattern matching (like search and replace) require nonoverlapping matches, we might define a mismatch as any substring that does not overlap a match. Even though this set may still be quadratic, it can be represented compactly using fuzzy regions [7]. We have not yet implemented this strategy. 3. Approximate matches. If the user specifies a literal string or regular expression pattern, then a set of mismatches can be generated by approximate string matching [15] which allows a bounded number of errors in the pattern match. We have not yet ....
[Article contains additional citation context not shown here]
R. C. Miller and B. A. Myers. Lightweight structured text processing. In Proceedings of the 1999 USENIX Annual Technical Conference, pages 131--144, June 1999.
....Java syntax. Our solution to this problem is a knowledge base, represented by a library of patterns and parsers that detect structure in text. Users can extend the library on the fly by specifying new patterns, which can be either regular expressions or high level patterns called text constraints [8]. # Generalization should be able to guess accurately from only one example. When multiple generalizations are consistent with the user s selection, the generalizer must make its best guess, which as often as possible should be the description the user intended. # Generalization should be ....
....of simultaneous editing implemented in our prototype system. Features of the user interface will be introduced by presenting an example of the system in operation. Our implementation of simultaneous editing is built into LAPIS, a text processing system which has been described previously [8][9] LAPIS has several unusual features that make it well suited to this effort. First, LAPIS supports multiple simultaneous text selections; most text editors allow only one contiguous selection. Multiple selections make it easy to display the corresponding selection in every record. Second, ....
[Article contains additional citation context not shown here]
Robert C. Miller and Brad A. Myers. Lightweight structured text processing. In USENIX 1999 Annual Technical Conference, pages 131--144, June 1999.
....deviations in an otherwise regular highlight can be noticed at a glance. Another is an abbreviated context view, showing only the selected lines from each record. A third view is an unusual matches view, showing only most unusual examples of the generalization, found by clustering the matches [7]. A third problem with large data sets is where the data resides. For interactive simultaneous editing, the data must fit in RAM, with some additional overhead for parsing and storing feature lists. For large data sets, this is impractical. However, it is easy to imagine interactively editing a ....
Robert C. Miller. Lightweight Structured Text Processing. PhD thesis, Carnegie Mellon University, 2001. in preparation.
....and saved as a script for later execution. We have implemented these extensions in a prototype web browser named LAPIS (Lightweight Architecture for Processing Information Structure) The first extension, consisting of a pattern language and textprocessing tools, was described in a previous paper [14], which is summarized below. This paper focuses on the other three features, which integrate a command shell into the web browser to create a browser shell. The browser shell addresses the problem of interactive web automation by allowing the user to apply patterns, script commands, and external ....
....web automation, creating web scripts by example, and invoking external programs and CGI programs. 3. 1 LAPIS The web browser we used to prototype the browser shell is called LAPIS (Figure 1) part of a system of generic tools for structured text that we call lightweight structured text processing [14]. Lightweight structured text processing enables users to define text structure interactively and incrementally, so that generic tools can operate on the text in structured fashion. Our lightweight structured text processing system has four components: ffl a pattern language for describing text ....
[Article contains additional citation context not shown here]
R. C. Miller and B. A. Myers. "Lightweight Structured Text Processing." Proc. USENIX 1999 Annual Technical Conference, Monterey, CA, June 1999, pp 131--144.
....irrelevant links. For example, a search results page can easily have more advertisement links than results. Therefore, the user can select a region of the browser text and only copy the links from that region onto the PDA. In the future, we expect to integrate more intelligent parsing technology [15] into the Web Assistant so the useful links can be selected and copied more automatically. Shortcutter The Shortcutter application combines many of the features of the previous utilities to allow users to create custom panels of shortcut buttons, sliders, knobs, and pads. In edit mode, users ....
Miller, R.C. and Myers, B.A. "Lightweight Structured Text Processing," in Usenix Annual Technical Conference. 1999. Monterey, California: pp. 131-144.
....irrelevant links. For example, a search results page can easily have more advertisement links than results. Therefore, the user can select a region of the browser text and only copy the links from that region onto the PDA. In the future, we expect to integrate more intelligent parsing technology [14] into the Web Assistant so the useful links can be selected and copied more automatically. Shortcutter The Shortcutter application combines many of the features of the previous utilities to allow users to create custom panels of shortcut buttons, sliders, knobs, and pads. In edit mode, users ....
Miller, R.C. and Myers, B.A. "Lightweight Structured Text Processing," in Usenix Annual Technical Conference. 1999. Monterey, California: pp. 131-144.
No context found.
Robert C. Miller and Brad A. Myers. Lightweight structured text processing. In Proceedings of the 1999 USENIX Annual Technical Conference, pages 131--144, June 1999.
No context found.
R. C. Miller and B. A. Meyers. Lightweight structured text processing. In Proc. of USENIX 1999.
No context found.
R. C. Miller and B. C. Myers. Lightweight Structured Text Processing. In Proceedings of 1999.
No context found.
R. C. Miller and B. A. Meyers. Lightweight structured text processing. In Proc. of USENIX 1999.
No context found.
R. C. Miller and B. A. Meyers. Lightweight structured text processing. In Proc. of USENIX 1999.
No context found.
R. C. Miller and B. C. Myers. Lightweight Structured Text Processing. In Proceedings of 1999.
No context found.
R. C. Miller and B. A. Meyers. Lightweight structured text processing. In Proc. of USENIX 1999.
No context found.
R.C. Miller. Light-Weight Structured Text Processing. PhD thesis, Computer Science Department, CarnegieMellon University, 2002.
No context found.
R.C. Miller. Light-Weight Structured Text Processing. PhD thesis, Computer Science Department, Carnegie-Mellon University, 2002.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC