| Jacobs, P.S. & Rau, L.F. (1990). SCISOR: Extracting information from on-line news. |
....subtopical discussions are about, for the purposes of information retrieval and hypertext navigation. One way to label texts, when working within a limited domain of discourse, is to start with a pre defined set of topics and specify the word contexts that indicate the topics of interest (e.g. [10]) Another way, assuming that a large collection of pre labeled texts exists, is to use statistics to automatically infer which lexical items indicate which labels (e.g. 12] In contrast, we are interested in assigning labels to general, domain independent text, without benefit of ....
Paul Jacobs and Lisa Rau. SCISOR: Extracting information from On-Line News. Communications of the ACM, 33(11):88-97, 1990.
....naming was cited as the principal reason for large numbers of missed citations in a large scale evaluation of an information retrieval system [Blair and Maron, 1985] A proper filter must be able to access information in the text using any word of a set of similar words. A number of knowledge rich [Jacobs and Rau, 1990, Calzolari and Bindi, 1990, Mauldin, 1991] and knowledge poor [Brown et al. 1992, Hindle, 1990, Ruge, 1991, Grefenstette, 1992] methods have been proposed for recognizing when words are similar. The knowledge rich approaches require either a conceptual dependency representation, or semantic ....
Paul Jacobs and Lisa Rau. SCISOR: Extracting information from on-line news. Communications of the ACM, 33(11):88-97, 1990.
....was cited as the principal reason for large numbers of missed citations in a large scale evaluation of an information retrieval system [ Blair and Maron, 1985 ] A proper filter must be able to access information in the text using any word of a set of similar words. A number of knowledge rich [ Jacobs and Rau, 1990, Calzolari and Bindi, 1990, Mauldin, 1991 ] and knowledge poor [ Brown et al. 1992, Hindle, 1990, Ruge, 1991, Grefenstette, 1992 ] methods have been proposed for recognizing when words are similar. The knowledge rich approaches require either a conceptual dependency representation, or semantic ....
Paul Jacobs and Lisa Rau. SCISOR: Extracting information from on-line news. Communications of the ACM, 33(11):88--97, 1990.
....information sources. To overcome this information overload problem [1] information filtering techniques are being developed to deliver information to users. State of the art filtering systems, e.g. Information Lens System [2] EDS Template Filler [3] Tapestry [4] Iscreen [5] or Scisor [6] show one or more of the following disadvantages: 1) lack of cognitive design, 2) static behavior: no autonomous learning mechanism, 3) no linguistic analysis of textual information, 4) no prioritizing of information, and (5) no support of collaborative filtering. A cognitive information ....
P.S. Jacobs, L.F. Rau, SCISOR: Extracting Information from On-line News, CACM 33(11), pp.88-97, 1990.
....and requiring less than 3 minutes CPU time per issue (this figure excludes shortlex lookup and the stages preceding, but includes longlex lookups) To put these figures in perspective, note that major system such as RelationalText never progressed beyond 60 precision at 30 recall. GE s SCISOR [7] had at the time 80 90 combined recall and precision, a result that has not been significantly improved upon in the past five years by any system performing detail parsing. The VFSA architecture thus appears competitive with other paradigms of grammar development currently in use. 3 Formal ....
Paul F. Jacobs and Lisa F. Rau, `SCISOR: extracting information from on-line news', Communications of the ACM 33(10) 88-97 (1990)
....is represented by the E. Bertino et al. Conceptual Annotations in a Web Based Inf. System 15 conceptual dictionaries, which include mainly pragmatic information text grammar rules allowing the quick focusing on a particular information to be extracted while ignoring irrelevant elements [1,20], triggering rules, evoked by the lexical entry examined, i.e. rules allowing the activation of predicative templates of the H TEMP hierarchy. All these data structures are directly managed by the annotation module and are assumed to be constructed in a set up phase of the architecture. In the ....
P. Jacobs and L. Rau. SCISOR : Extracting Information from On-line News. Communication of the ACM, 33(11):88-97, 1990.
....like semantic networks and frames have been used to represent knowledge by specifying primitives or simple concepts and then combining them to define more complex concepts. Most information retrieval system that use semantic analysis techniques are conceptual information retrieval systems [13] [19] In conceptual information retrieval the user requests information, and is given the information directly, not just a reference to where it may be found. The most common semantic representation structures used in information retrieval are case frames [13] 26] 12] and scripts [16] The ....
....information retrieval systems [13] 19] In conceptual information retrieval the user requests information, and is given the information directly, not just a reference to where it may be found. The most common semantic representation structures used in information retrieval are case frames [13] [26] 12] and scripts [16] The main difficulty of semantic analysis is the large amount of knowledge needed to process the meaning of a sentence. For that reason, semantic analysis usually is done only in a restricted domain using a knowledge base manageable in size and complexity. 5 ....
P. S. Jacobs and L. F. Rau, "SCISOR: Extracting Information from On-line News," CACM, vol. 33(11), Nov. 1990, pp. 88-97.
....This is because IR systems do not take into account linguistic features. More linguistically oriented text classification can be achieved by the use of pattern matching (PM) PM can produce a fairly accurate and fast categorisation over a large number of classes [Hayes and Weinstein, 1991; Jacobs and Rau, 1990]. Resource development does not require linguistic expertise and can be done by trained users. But PM is still weak on the analysis of linguistic structures and cannot be used for fine grained categorisation. Information extraction (IE) techniques can be also used for text classification (e.g. ....
P.S. Jacobs and L.F. Rau. SCISOR: Extracting Information from On-line News. Communications of the ACM, 33(11), 1990.
....[10] which states that optimum retrieval is achieved when documents are ranked according to decreasing values of their probability of relevance with respect to the current query. Differences between IF and IR are described in Table 1. The retrieval models described above are applied to IF [7] [11]. In non intelligent IFS simple Keyword Matching determines whether the user s information interests match the incoming information items of the system. Information Filtering Information Retrieval System input dynamic datastream static database User goals long term periodic desires ....
P.S. Jacobs, L.F. Rau, "SCISOR: Extracting Information from On-line News," CACM 33(11), 1990, pp.88-97.
....full text articles can automatically be indexed better than humans. Evans et al. 1991) The work in this project incorporates case based and automatic indexing techniques to filter information. Another recent knowledge based approach to text processing has been implemented in the SCISOR system (Jacobs Rau, 1990). SCISOR is designed to process financial news stories regarding corporate mergers and acquisitions from an online news service and extract important information into a structured form. Drawing upon previous approaches and operating upon a much larger scale than previous systems, SCISOR s ....
....a user model has been constructed and key features extracted from input articles, an algorithm is necessary to classify the articles. Typically, the classification algorithm will be closely integrated with the feature extraction method, although the two components are modularized in some systems (Jacobs Rau, 1990). For information filtering, the classifier will simply be determining the interest level of a particular document. The original news reading program for Unix is RN, short for Read News . Although simple, RN does contain primitive support for filtering. Upon user request, a KILL file can be ....
Jacobs, P.S. & Rau, L.F. (1990). SCISOR: Extracting Information from On-line News.
....matching level model, documents are represented as sets of keywords, whereas in the vector space model, documents are represented as vectors, where each entry corresponds to a weighted keyword. More complex indexing elements can be used such as sets of terms [16] noun phrases [15] scripts [4], or facts [9] Furthermore, there are various ways to determine the indexing elements: manually by a human indexer (librarian, cataloguer like in Yahoo ) automatically by extracting them from the document content or with the help of thesaurus. Also, the vocabulary (the set of indexing elements) ....
P.S. Jacobs and L.F. Rau. Scisor: Extracting information from on-line news. Communications of the ACM, 33(11), 1990.
....level model [15] documents are represented as sets of keywords, whereas in the vector space model [8] documents are represented as vectors, where each entry corresponds to a weighted keyword. More complex indexing structures can be used such as groups of terms [13] noun phrases [12] scripts [3], or facts [6] Furthermore, there are various ways to determine the indexing elements: manually by a human indexer (librarian, cataloguer like in Yahoo ) automatically by extracting them from the document content sometimes with the help of thesaurus. Also, the vocabulary (the set of indexing ....
P.S. Jacobs and L.F. Rau. Scisor: Extracting information from on-line news. Communications of the ACM, 33(11), 1990.
.... Documents User query based Infoscope [1] Documents User agents rule based Heuristics Iscreen [7] Documents User rule based Conflict detection, explanation What If queries Lyric time [8] Music System rule based Pasadena [9] Documents User query based User dialogue Filter: Queries Scisor [10] Documents User query based Tapestry [11] Documents User System query based User comments, TQL Filter: TQL Queries Table 1: Information Filtering Systems 3 Natural Language Processing Within the CIFS linguistic analysis is performed by the indexer parser module. Additionally, a pre filter ....
P.S. Jacobs, L.F. Rau, SCISOR: Extracting Information from On-line News, CACM 33(11), pp.88ff., 1990.
....interests profiles queries Environment more or less privacy more or less public User groups undefined well defined Table 1: Information filtering vs. information retrieval 5 Differences between IF and IR are described in Table 1. The retrieval models described above are applied to IF [7] [11]. Simple Keyword Matching determines whether the user s information interests match the incoming information items of the system. It is most important to the process of filtering that the indexing component consists of: a lexical scanner, a morphological component, and . a component for ....
P.S. Jacobs, L.F. Rau. SCISOR: Extracting Information from On-line News. CACM 33(11), pp.88-97, 1990.
....very powerful, it places a heavy burden on the user. Ongoing projects are developing models on which future expert systems will be based (Belkin Marchetti, 1990; Chen, 1990) Recent projects focus on incorporating other aspects of artificial intelligence, particularly natural language processing (Jacobs Rau, 1990) and probabilistic inference over networks of documents (Croft Turtle, 1992) 1.2.2 Search Strategies The automatic query reformulation incorporated in the systems described in the previous section are, in general, very primitive. However, search strategies employed by both novice and ....
Jacobs, P.S. & Rau, L.F. (1990). SCISOR: Extracting information from on-line news.
....(Miller and Drexler 1988) which identified specific keywords for personalised information filtering. Other probablistic methods exist based on keywords, such as vector space representations (Salton and McGill 1983) and Latent Semantic Indexing (Foltz and Dumais 1992) PIES (Ram 1992) and SCISOR (Jacobs and Rau 1990) are systems that make use of natural language processing techniques to identify concepts within articles. 1 INTRODUCTION 2 2. Once concepts have been identified, some mechanism must be employed to determine whether the information is of interest to the user. The agent may maintain a knowledge ....
Jacobs, P. and L. Rau (1990). SCISOR: Extracting Information from On-Line News. Communications of the ACM 33 (11), 88--97.
....Information extraction is a natural language processing task that involves automatically extracting predefined types of information from text. In contrast to in depth understanding, information extraction systems focus only on portions of text that are relevant to a specific domain (e.g. see [ Jacobs and Rau, 1990; Lehnert and Sundheim, 1991 ] For example, an information extraction system designed for a terrorism domain might extract the names of perpetrators, victims, physical targets, and weapons involved in a terrorist attack. Or an information extraction system designed for a joint ventures domain ....
Jacobs, Paul and Rau, Lisa 1990. SCISOR: Extracting Information from On-Line News. Communications of the ACM 33(11):88--97.
....Finally, we address the problem of evaluation before we conclude the paper and give some prospects to future work. 2 Related Work There already exist several quite successful information filtering systems, e.g. Information Lens System [29] EDS Template Filler [44] Iscreen [37] or SCISOR [24]. First prototypes of adaptive information filtering systems were developed at the University of Colorado [45] MIT [28] and University College Cork [36] for other recent work on combining the fields of user modeling and information filtering see [30] or [34] Regarding the applied techniques ....
P. S. Jacobs and L. F. Rau. SCISOR: Extracting information from on-line news. Communications of the ACM, 33(1):88--97, 1990.
....and associated background knowledge. A key feature of the formalism is that it supports the generation of multiple hypotheses and uses its knowledge sources to sift through and assess competing hypotheses. 4. 5 Further Work To learn more about information extraction techniques and systems, see [47, 36]. Several systems have been developed recently that learn dictionaries for information extraction, such as [43, 67, 74] Some older systems that incorporated symbolic learning techniques with natural language processing include [1, 29, 9, 37] Explanation based learning has also been previously ....
P. Jacobs and L. Rau. SCISOR: extracting information from on-line news. Communications of the ACM, 33(11):88--97, 1990.
....the text database. Our system, known as NLDB, mimics these category assignments, extracting company names [Rau, 1991] topics or subject indicators, industries, and others (including, for example, stock exchanges and geographic regions) The program also incorporates portions of the SCISOR system [Jacobs and Rau, 1990], which can fill certain other fields, such as the target and suitor of a takeover. This sort of system has a simple appeal: the answers (the set of category assignments) are usually clear cut, yet they clearly require some detailed content analysis. On the other hand, the technologies that ....
....Statistics can then guess the industries associated with Y. The NLP method used in NLDB associates categories with linguistic patterns. We will next describe the pat tern language, then explain how statistical methods can automatically add simple patterns. 4 Lexico Sernantic Patterns In SCISOR [Jacobs and Rau, 1990], MUC [Jacobs et at. 1991; Krupka et al. 1991] and other applications, we have found that exically driven pre processing serves as a complement to parsing and semantic interpretation, both in identifying portions of relevant text and in marking the input text to make it easier to process. Our ....
Paul Jacobs and Lisa Rau. SCISOR: Extracting information from on-line news. Communications of lhe Associalion for Cornpuling Machinery, 33(11):88-97, November 1990.
No context found.
Jacobs, P.S. & Rau, L.F. (1990). SCISOR: Extracting information from on-line news.
No context found.
P. Jacobs and L. Rau. "SCISOR: Extracting Information from On-line News," Communications of the ACM, 33(11):88-97, 1990.
No context found.
Jacobs, P., and Rau, L. (1990), "SCISOR: extracting information from on-line news." Communications of the ACM, Vol. 33, No. 11, pp. 88-97.
No context found.
Jacobs, Paul S. and Lisa F. Rau. 1990. SCISOR: Extracting Information from On-line News. Communications of the ACM, vol.33(11), pp.88-97.
No context found.
Jacobs, P.S. and L.F. Rau. 1990. SCISOR: Extracting Information from On-Line News. Communications of the 1 3 ACM 33(11): 88--97.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC