| Paynter, G.W., Witten, I.H, Cunningham, S.J., Buchanan, G.: Scalable Browsing for Large Collections: A Case Study. In Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, TX, USA (2000) 215-223 |
....a particular phrase. A crude solution is to use stopping, as is done by some widely used web search engines (the Google search engine, for example, neglects common words in queries) but this approach means that a small number of queries cannot be evaluated, while many more evaluate incorrectly [12]. Another solution is to index phrases directly, but the set of word pairs in a text collection is large and an index on such phrases di#cult to manage. In recent work, nextword indexes were proposed as a way of supporting phrase queries and phrase browsing [2, 3, 15] In a nextword index, for ....
....be safely neglected during query evaluation. In others, however, common words play an important role, as in the movie title end of days or the band name the who, and evaluation of these queries is di#cult with the common words removed, especially when both the and who happen to be common terms [12]. Taken together, these observations suggest that stopping of common words will have an unpredictable e#ect. Stopping may yield e#ciency gains, but means that a significant number of queries cannot be correctly evaluated. We experimented with a set of 122,438 phrase queries that between them ....
[Article contains additional citation context not shown here]
G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan. Scalable browsing for large collections: A case study. In Proc. of the 5th ACM International Conference on Digital Libraries, pages 215--223, San Antonio, 2000.
....term Liver, indicates that the article or book is not about the liver in general, but rather is specifically about the effect of drugs on the liver. We used both the MeSH terms and qualifiers as our keyword features. 4.3. Word Features Researchers generally use word features to represent text [9,11,12,27]. We used the same stopwords as those used to generate concepts, that is a generic set of stopwords [29] augmented with numbers, months, days of the week and 31 medical stopwords. The approach most often used is to remove stopwords, and then do word stemming, a process that removes a word s ....
G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan, "Scalable browsing for large collections: a case study", 5th Conf. Digital Libraries, Texas, pp.215-218, 2000.
No context found.
Paynter, G.W., Witten, I.H, Cunningham, S.J., Buchanan, G.: Scalable Browsing for Large Collections: A Case Study. In Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, TX, USA (2000) 215-223
No context found.
Paynter, G.W., Witten, I.H, Cunningham, S.J., Buchanan, G.: Scalable Browsing for Large Collections: A Case Study. In Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, TX, USA (2000) 215-223
No context found.
Paynter, G.W., Witten, I.H., Cunningham, S.J. and Buchanan, G. (2000): Scalable browsing for large collections: a case study. Proc Fifth ACM Conference on Digital Libraries, San Antonio, TX, pp. 215---223.
....implementation of plugins and classifiers for non textual data. Classifiers allow hierarchical browsing. Hierarchical phrase and keyphrase indexes of text, or indeed any metadata, can be created using standard classifiers. Such interfaces are described by Gutwin et al. 3] and Paynter et al. [5]. Designed for multi gigabyte collections. Collections can contain millions of documents, making the Greenstone system suitable for collections up to several gigabytes. Compression is used to reduce the size of the indexes and text [6] Small indexes have the added bonus of faster retrieval. New ....
....language . Acronyms could be extracted from the text automatically [9] and a list of acronyms added . Keyphrases could be extracted from each document [2] and a keyphrase browser added . A phrase hierarchy could be extracted from the full text of the documents and made available for browsing [5] . The format of any of these browsers, or of the documents themselves when they were displayed, or of the search results list, could all be altered by appropriate format statements. Skilled users could add any of these features to the collection by making a small change to the panel in Figure ....
Paynter, G.W., Witten, I.H., Cunningham, S.J. and Buchanan, G. (2000) "Scalable browsing for large collections: a case study." Proc Fifth ACM Conference on Digital Libraries, San Antonio, TX, pp. 215--223; June.
.... of metadata in the uncompressed text [3] Another technique is syntactic analysis, marking text according to its grammatical function, splitting a text into noun phrases and verb phrases, for example) this is used in a phrase browsing system developed for the Food and Agriculture Organisation [4]. Heuristic analysis merely looks at the text in the hope of forming certain rules about the context in which a given type of metadata may appear and it was this approach that was used in a London historical collection to extract dates, proper names and geographical locations from historical ....
G.W. Paynter, I.H Witten, S.J. Cunningham and G. Buchanan, "Scalable browsing for large collections: a case study" in proc 5th ACM Conference on Digital Libraries, June 2000, pp 215-218.
No context found.
Paynter, G.W., Witten, I.H., Cunningham, S.J. and Buchanan, G. Scalable Browsing for Large Collections: a Case Study. In Proceedings of Digital Libraries'00: The Fifth ACM Conference on Digital Libraries, (San Antonio, TX, USA, 2000), ACM Press, 215-223.
No context found.
Paynter, G.W., Witten, I.H., Cunningham, S.J. and Buchanan, G. (2000) "Scalable browsing for large collections: a case study." Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, TX, pp. 215-223. http://www.acm.org/pubs/articles/proceedings/dl/336597/p215-paynter/p215-paynter.pdf
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC