| R. Chandrasekar and B. Srinivas. 1996. Using syntactic information in document filtering: A comparative study of part-of-speech tagging and supertagging. Technical Report IRCS 96--29, University of Pennsylvania. |
.... topic in machine learning (ML) and in computational linguistics (CL) Relevant CL work ranges from text filtering for data extraction and information retrieval (Lewis Tong, 1992) to information classification or document filtering based on features extracted via shallow parsing (for example, Chandrasekar Srinivas, 1997; Ikeda et al. 1998) The ML community has used WordNet (Miller, 1990) to support rule induction systems for text classification (for example, Junker Abecker, 1997) ML researchers also attempt text classification or categorization based on such techniques as k nearest neighbour, decision ....
CHANDRASEKAR, R. & B. SRINIVAS (1997). Using Syntactic Information in Document Filtering: A Comparative Study of Part-of-speech Tagging and Supertagging. Proceedings of the RIAO-97 Conference, pp.531-545.
....Retrieved Documents Pattern Training Phase Pattern Application Phase Tokenizer Preprocessor Pattern Matcher Tokenizer Preprocessor Documents Filtered Information Retrieval System Query Figure 8. 1: Overview of the Information Filtering scheme Such a tool for information filtering, named Glean [Chandrasekar and Srinivas, 1996], is being developed in a research collaboration between the National Centre for Software Technology (NCST) Bombay, the Institute for Research in Cognitive Science (IRCS) and the Center for the Advanced Study of India (CASI) University of Pennsylvania. Glean seeks to innovatively overcome some ....
....recovered 25 out of 28 relative clauses and 14 of 14 appositives. We generated 1 spurious relative clause and 2 spurious appositives. Dependency based simplification provides many advantages over simplification methods that employ Finite State Grammar for the analysis of a complex sentence [Chandrasekar et al. 1996]. Simplification rules in these methods manipulate noun groups and verb groups provided by the sentence analysis phase. As a result, the rules for the simplifier have to identify basic predicate argument relations to ensure that the right chunks remain together in the output. The simplifier in the ....
[Article contains additional citation context not shown here]
R. Chandrasekar and B. Srinivas. Using syntactic information in document filtering: A comparative study of part-of-speech tagging and supertagging. Technical Report IRCS 96--29, University of Pennsylvania, 1996.
....Web search query System Recall Precision Plain Web Search (28 28) 100 (28 84) 33.3 (Without Glean) baseline] With Glean filtering (23 28) 82.1 (23 29) 79.3 Table 2: Precision and Recall for retrieving relevant documents from the Web 3.3. Glean: Using Parts of Speech filters In (Chandrasekar and Srinivas, 1996), we similarly show how Parts of Speech (POS) tags can be used to provide syntactic context for a filtering scheme. We use a N gram tagger (similar to (Church, 1988) which uses a 40 tag tagset from the Penn TreeBank (Marcus et al. 1993) This tagger has been extensively tested, and is found to ....
....and irrelevant sets. Any sentence which matched any one of these 20 patterns was considered relevant. The relevant sets for each method (POS tags and Supertags) were then compared to the gold standard relevant set. The result of these filtering experiments are summarized in Table 3, from (Chandrasekar and Srinivas, 1996)) for the supertagging method, the POS tag method and for the base case. The second column in the table gives the count of sentences judged relevant by humans. The other columns list judgments made by the program, and the overlap they have with the standard set. Table 4 shows the recall and ....
R. Chandrasekar and B. Srinivas. 1996. Using syntactic information in document filtering: A comparative study of part-of-speech tagging and supertagging. Technical Report IRCS 96--29, University of Pennsylvania.
....Bombay 400049, India as a bag of words. Since any coherent text contains significant latent information, such as syntactic structure and patterns of language use, this can be exploited to make information retrieval and extraction more effective. In previous work (Chandrasekar Srinivas 97a; Chandrasekar Srinivas 97b) we have shown that syntactic information, in the form of either simple part of speech labels or richer supertag information, can be used to improve the effectiveness of filtering irrelevant documents. In this paper, we go further, investigating the effect of (varying) the extent of context on ....
.... patterns to a total of 529 documents from New York Times (July 1995) a total of 361 documents were flagged as irrelevant of which 343 documents matched the gold standard of 434 irrelevant documents, yielding precision and recall figures of 95 and 79 respectively (Chandrasekar Srinivas 97a; Chandrasekar Srinivas 97b) The size of the context window used (one chunk on either side of the domain term) is sometimes inadequate. Syntactic phenomena occurring outside this window is not captured; for example relative clauses are not signaled when a relativizer is more than one chunk away. Error analyses in these ....
[Article contains additional citation context not shown here]
R. Chandrasekar and B. Srinivas. Using Syntactic Information in Document Filtering: A Comparative Study of Part-of-speech Tagging and Supertagging. Proc. RIAO'97, Montreal, June 1997.
....corresponding to the concept is emphasized. New operators can be easily defined by the user. 5 Information Retrieval The IR agents in AKIRA aim to glean information from given segments of text. Some of these agentive services are based on techniques developed for the Glean project [RC93, CS97a, CS97b] AKIRA will use similar and extended agents. There is a lot of information latent in any coherent text, which can be used to enhance retrieval efficiency. This information includes not just the syntactic structure of natural language text, but also the structure inherent in many domains such as ....
....tokens that may be present in the query as well as the document so as to improve the relevance of retrieved information. We also exploit the linguistic structure implicit in the document to postprocess the results of Web search engine so as to improve the precision of the retrieved results [CS97a, CS97b] 7 Conclusion AKIRA is a system that behaves as a proxy server for a user. It establishes an interface between the user and the sources of information. The system can be viewed as a smart cache since it will retrieve data for the user and restructure it. The use of information retrieval and ....
R. Chandrasekar and B. Srinivas. Using syntactic information in document filtering: A comparative study of part-of-speech tagging and supertagging. In In Proceedings of RIAO'97, Montreal, June 1997.
....These include documents which do not contain the standard CFP phrases, documents which are Web redirection documents, empty documents etc. We may also discard documents which contain the word Archive (these may be mega Archives without much relevant content) A filtering tool such as Glean [CS97] may be used for this purpose. 3.6 Extracting Information: fragments From retrieved documents, we identify names of meeting and dates thanks to our IE agents. A conference is identified by its name and has a canonical representation expressed by an acronym (for example CAISE98) A date is a ....
....sent by the View Factory which specifies its schema as well as its population. The Agent Pool contains IR, IE, formatter agents, etc. IR agents consist of wrappers to correspond with data sources available on the Web (search engines or services) and information filtering tools such as Glean [CS97] IE agents extract concepts and meta concepts. IE agents such as conference acronym and location recognizers together with a co reference tool identify concept instances. SuperTagging [JS94] which provides rich syntactic labels, and zoners extract meta concepts. Formatter agents can be of ....
R. Chandrasekar and B. Srinivas. Using Syntactic Information in Document Filtering: A Comparative Study of Part-of-speech Tagging and Supertagging. In In Proceedings of RIAO'97, Montreal, June 1997.
.... view expression sent by the View Factory which specifies its schema as well as its population (through methods invoking IE agents) The IR agent pool consists of wrappers to correspond with databases available on the Web (search engines or services) and information filtering tools such as Glean [2]. The IE component provides different IE tools extracting concepts and meta concepts. IE agents such as conference acronym, location recognizers together with a coreference tool identify concept instances. SuperTagging [5] which provides rich syntactic labels, or zoning tools extract ....
R. Chandrasekar and B. Srinivas. Using Syntactic Information in Document Filtering: A Comparative Study of Part-of-speech Tagging and Supertagging. In In Proceedings of RIAO'97, Montreal, June 1997.
.... AKIRA s IR agents used for information extraction are highly autonomous and use available Web search engines to locate information or other filtering tools such as Glean [RC93] Some of AKIRA text based agentive services are based on techniques developed for the Glean project [RC93, CS97a, CS97b] In effect, there is a lot of information latent in any coherent text, which can be used to enhance retrieval efficiency. This information includes not just the syntactic structure of natural language text, but also the structure inherent in many domains such as tables of stock prices, time line ....
....words to approximate the content of documents: each document is parsed and an entry (pointing back to the document) is created in the index for every relevant word. But these methods ignore the relations among words. Our approach based on techniques developed for the Glean project [CS97a, CS97b] focuses not only on the words in a text but also on the relations between words. AKIRA s architecture is aimed at integrating plugging in new services. Our flat representation allows us to take advantage of various available techniques developed for multimedia processing. Most existing services ....
R. Chandrasekar and B. Srinivas. Using syntactic information in document filtering: A comparative study of part-of-speech tagging and supertagging. In In Proceedings of RIAO'97, Montreal, June 1997.
No context found.
R. Chandrasekar and B. Srinivas. 1997b. Using syntactic information in document filtering: A comparative study of part-of-speech tagging and supertagging. In Proceedings of RIAO'97, Montreal, June.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC