55 citations found. Retrieving documents...
Yang, Y. (2001), `Expert network: Effective and efficient learning from human decesions in text categorization and retrieval', Proc. of the 7 t h Annual International ACN-SIGIR Conference on Research and Development in Information Retrieval Dublin.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

On Machine Learning Methods for Chinese Document Categorization - He, Tan (2003)   (Correct)

....Whereas an extensive range of methods have been applied to English text categorization, relatively few benchmarks have been done for Chinese text. Typical approaches to Chinese text categorization, such as Naive Bayes (NB) 1] Vector Space Model (VSM) 2, 3] and Linear List Square Fit (LLSF) [4, 5], have well studied theoretical basis derived from the information retrieval research, but are not known to be the best classifiers [6, 7] In addition, there is a lack of publicly available Chinese corpus for evaluating Chinese text categorization systems. This paper reports our comparative ....

Y. Yang, "Expert network: Effective and efficient learning from human decisions in text categorization and retrieval," in 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1994.


Using Text Categorization Techniques for Intrusion Detection - Liao, Vemuri (2002)   (5 citations)  (Correct)

....and the set of system calls generated by a process is treated as the document .This analogy makes it possible to bring the full spectrum of well developed text processing methods [9] to bear on the intrusion detection problem. One such method is the k Nearest Neighbor classification method [10, 11]. The rest of this paper is organized as follows. In Section 2 we review some related work. Section 3 is a brief introduction to the kNN text categorization method. Section 4 describes details of our experiments with the 1998 DARPA data. We summarize our results in Section 5, and Section 6 ....

Y. Yang, "Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval", Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 13-22, 1994.


Incremental Document Clustering for Web Page Classification - Wong, Fu (2000)   (5 citations)  (Correct)

....people would analyze the contents of the document first. Therefore a large amount of human effort would be required. There has been some research work conducted on automatic text classification. One approach is to learn the text classifiers by using the machine learning techniques [1] 3] 4] [5]. However, these algorithms are based on a set of positive and negative training examples for learning the text classifiers. The quality of the resulting classifiers highly depends on the fitness of the training We shall use the terms Web pages and Web documents and documents interchangeably. ....

.... three alternatives for measuring the similarity between two document feature vectors based on the classical vector space model [22] The first measure is called cosine correlation which is commonly used to measure the similarity between the feature vectors of two documents (e.g. 1] 23] 3] [5]) A larger cosine value indicates that they are more similar. The second similarity measuring method uses the Euclidean distance, which is the most common similarity measuring method in high dimensional data clustering [8] 6] 7] The third alternative is the Manhattan distance. We have ....

Yiming Yang, Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 13-22, Dublin, Ireland, July, 1994


Arg: A Tool For Automatic Report Generation - Karakaya, Güvenir (2002)   (Correct)

....becomes more important. Text categorization may be defined as the problem of assigning free text documents to predefined categories. A growing number of statistical learning methods have been applied to this problem in recent years, including regression models[7,27] nearest neighbor classifiers[11,14,25,26,28], Bayes belief networks[3,9,10,12,16,17,18,23] decision trees[2,9,12,18] neural networks[22,24] and inductive rule learning techniques[ 1,5,6,19] Recent studies have proved the success of statistical approaches for learning to classify text documents [3,10,16,20,26] These approaches commonly ....

Yang ,Y., Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13-22, 1994.


Multi-Document Summarization and Visualization in the Informedia.. - Wactlar (2001)   (1 citation)  (Correct)

.... Integration of speech, language, and image processing: generating multimedia abstractions, segmenting video into stories, and tailoring presentations based on context [Wactlar96,99a, Christel97a,97b] Text processing: headline generation [Hauptmann97a] text clustering and topic classification [Yang94a,98a, Lafferty98, Hauptmann98b], and information retrieval from spoken documents [Hauptmann97b,97c,98c] Audio processing: speech recognition [Witbrock98a,98b] segmentation and alignment of spoken dialogue to existing transcripts [Hauptmann98a] and silence detection for better skim abstractions [Christel98] Image ....

Yang, Y. Expert network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Proc. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '94), 13-- 22, July 3-6, 1994. New Directions in Video Information Extraction and Summarization -- Carnegie Mellon University 7


A System for Supporting Cross-Lingual Information Retrieval - Capstick, al. (1999)   (Correct)

....10 their frequency in a training corpus. 8 Our evaluation has shown that this categoriser did not perform as well for web pages as the KNN categoriser. The n gram categoriser is more useful in situations with noisy input, such as categorising OCR output. The k nearest neighbour algorithm (Yang, 1994) is a statistical algorithm which classifies a new document by combining the category assignments of the k most similar training documents, weighted by the similarity between the new document and each of the k best matching training documents. The categoriser has been trained with documents from ....

Yang, Y. (1994). Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. Paper presented at the 17th ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 13 -- 22. Dublin.


MULINEX - Multilingual Indexing, Navigation and Editing.. - Erbach, Uszkoreit (1999)   (Correct)

....ordered by their frequency in a training corpus. 7 Our evaluation has shown that this categoriser did not perform as well for web pages as the KNN categoriser. The n gram categoriser is more useful in situations with noisy input, such as categorising OCR output. The k nearest neighbour algorithm (Yang, 1994) is a statistical algorithm which classifies a new document by combining the category assignments of the k most similar training documents, weighted by the similarity between the new document and each of the k best matching training documents. The categoriser has been trained with documents from ....

Yang, Y. (1994). Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. Paper presented at the 17th ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 13 -- 22. Dublin.


Categorisation by Context - Attardi (1998)   (4 citations)  (Correct)

....categorisation, which may be dubbed categorisation by context, since it exploits the context surrounding a link in an HTML document to extract useful information for categorising the document it refer to. This technique is complementary to the traditional technique of categorisation by content [Yang 94, Schtze 95, Ng 97] where information for categorising a document is extracted from the document itself. Such approach may exploit linguistic analysis to determine relevant portions of the text [Fuhr 91] and then exploits probabilistic or statistical analysis to perform feature selection [Yang ....

Yang, Y.: "Expert network: effective and efficient learning from human decisions in text categorisation and retrieval"; Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, IE (1994), 13--22.


Language Sensitive Text Classification - Basili, Moschitti, Pazienza (2000)   (2 citations)  (Correct)

....(the testset document are assigned to the classes proportionally to their size) Yang, 1999) 2.3. Text Classification Models Several classification models have been proposed in literature and their distinctive aspects will be here briefly summarized. KK Neighbor is an example based classifier, (Yang, 1994), making use of document to document similarity estimation that selects a class for a document through a kk nearest heuristics. Rocchio (Ittner et al. 1995; Cohen and Singer, 1996) often refers to TC systems based on the Rocchio s formula for profile estimation. RIPPER (Cohen and Singer, 1996) ....

....an RDS inference mechanism. This produces the best performance figure for linear classifiers on the Reuters3 data set (80.52 ) Other more accurate classifiers have been also presented. The best figure on the Reuters3 corpus is obtained by the example driven k NN classifier (85 ) and LLSF (85 ) (Yang, 1994). However, their higher training and classification complexity makes their design and use more difficult within real operational domains. Other classifiers based on complex learning (e.g. Ripper, Cohen and Singer, 1996) or knowledge based algorithms (e.g. SWAP 1, Apt et al. 1994) show a ....

Yang Y. (1994). Expert network: effective and efficient learning from human decisions in text categorisation and retrieval. In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 13--22, Dublin, IE.


Predictive Self-Organizing Networks for Text Categorization - Tan (2001)   (Correct)

....techniques that automatically generate text categorization knowledge based on training examples. Such techniques include decision trees [2] K nearest neighbor system (KNN) 7,19] rule induction [8] gradient descent neural networks [9, 16] regression models [18] Linear Least Square Fit (LLSF) [17], and support vector machines (SVM) 6, 7] All these statistical methods adopt a supervised learning paradigm. During the learning phase, a classifier derives categorization knowledge from a set of prelabeled or tagged documents. During the testing phase, the classifier makes prediction or ....

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings, 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1994.


Hierarchical Classification of Web Content - Dumais, Chen (2000)   (32 citations)  (Correct)

....the use of automatic classification methods to supplement human effort in creating structured knowledge hierarchies. A wide range of statistical and machine learning techniques have been applied to text categorization, including multivariate regression models [8,22] nearest neighbor classifiers [26], probabilistic Bayesian models [13, 16] decision trees [16] neural networks [22, 25] symbolic rule learning [1, 4] and support vector machines [7,12] These approaches all depend on having some initial labeled training data from which category models are learned. Once category models are ....

Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1322, 1994.


Text Categorization Using Weight Adjusted k-Nearest.. - Han, Karypis, Kumar (1999)   (7 citations)  (Correct)

....[LG94, LR94, Lew98, MN98] they have a major limitation due to the independence assumption they make while words in document data sets tend to be dependent. k nearest neighbor (k NN) classification is an instance based learning algorithm that has shown to be very effective in text classification [Yan94, CH98] The success of this scheme is due to the availability of effective similarity measures such as cosine measure [Sal89] However, the effectiveness of these similarity measures become worse as the number of words increases. PEBLS [CS93] is a k NN classification algorithm that incorporates ....

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In SIGIR-94, 1994.


Automatic Web Page Categorization by Link and Context.. - Attardi, Gullì.. (1999)   (23 citations)  (Correct)

....sources (general classification indexes, specialized thesauri, etc. Techniques for automatically deriving representations of categories ( category profile extraction ) and performing classification have been developed within the area of text categorization [Ittner 95, Lewis 96, Ng 97, Schtze 95, Yang 94, Yang 97] a discipline at the crossroads between information retrieval and machine learning. Text categorization uses machine learning techniques to inductively build representations of a given set of categories from a training set of documents pre categorized under them. An automatic process ....

....of these categories with the representation of a given document d in order to decide to which of these categories it belongs. Alternatively, a document can be compared to previously classified documents and placed in the categories where its most similar documents have also been placed [Yang 94] thus avoiding the need for the construction of explicit category profiles. Document representations are typically obtained through standard indexing techniques developed within the field of information retrieval [Salton 88] measures of similarity, such as the ones embodied in vector space ....

Yang, Y.: "Expert network: effective and efficient learning from human decisions in text categorisation and retrieval", Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, IE, 13--22, 1994.


A Comparative Study on Chinese Text Categorization Methods - He, Tan, Tan (2000)   (2 citations)  (Correct)

....extensive range of methods have been applied to English text categorization, relatively few have been benchmarked for Chinese text categorization. Typical approaches to Chinese text categorization, such as NaiveBayes (NB) 21] Vector Space Model (VSM) 22] 23] and Linear List Square Fit (LLSF) 2][17], havewell studied theoretical basis derived from the information retrieval research, but are not known to be the best classifiers [16] 18] In addition, there is a lackofpubliclyavailable Chinese corpus for evaluating Chinese text categorization systems. This paper reports our applications of ....

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings, 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1994.


Classification Algorithms for NETNEWS Articles - Hsu, Lang (1999)   (2 citations)  (Correct)

....Thus, designing automated procedures for news categorization becomes a practical as well as a challenging problem. A typical solution uses a training set to extract the features that characterize the individual news categories. The techniques employed include machine learning [2] 10] 13] 21] 23][25], statistical [18] 27] knowledge based [15] or the combinations [9] After the initial training stage, the news categorization system may apply periodic maintenance to avoid performance deterioration caused by the presence of new terms and new topics being discussed in the news articles. Also, ....

....learning methods in conjunction with a notion of differential misclassification cost to produce junk e mail filters. The reported work shows how to learn effective filters that eliminate a large portion of junk mail from a user s mail stream. The Expert Network (ExpNet) system developed by Yang [25] achieved good categorization performance on large documents. ExpNet is used in relevance ranking of candidate categories for an arbitrary text. In text retrieval applications, ExpNet is used for relevance ranking of documents via categories. A recent work [11] proposed the Generalized Instance ....

Y. Yang. "Expert network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval" Proceedings of SIGIR, pp. 13-22, 1994.


Incremental Document Clustering for Web Page Classification - Wong, Fu (2000)   (5 citations)  (Correct)

....class, people would analyze the contents of the document first. Therefore a large amount of human effort would be required. There has been some research work conducted on automatic text classification. One approach is to learn the text classifiers by using the machine learning techniques [9, 12, 18, 21]. However, these algorithms are based on a set of positive and negative training examples for learning the text classifiers. The quality of the resulting classifiers highly depends on the fitness of the training examples. There are many terms and classes in the World Wide Web (or just the Web) ....

.... We will consider three alternatives for measuring the similarity between two document feature vectors based on the classical vector space model [17] The first measure is called cosine correlation which is commonly used to measure the similarity between the feature vectors of two documents (e.g. [9, 10, 12, 21]) A larger cosine value indicates that they are more similar. The second similarity measuring method uses the Euclidean distance, which is the most common similarity measuring method in high dimensional data clustering [6, 14, 24] The third alternative is the Manhattan distance. Given two ....

Yiming Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 13--22, Dublin, Ireland, July 1994.


A Study of Approaches to Hypertext - Categorization Yiming Yang   Self-citation (Yang)   (Correct)

No context found.

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR ConferenceonResearch and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.


A Re-Examination of Text Categorization Methods - Yang, Liu (1999)   (148 citations)  Self-citation (Yang)   (Correct)

....distribution over labelled documents. Topic spotting for newswire stories, for example, is one the most commonly investigated application domains in the TC literature. An increasing number of learning approaches havebeenapplied, including regression models[9, 32] nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3] decision trees[9, 16, 20, 2, 12] inductive rule learning[1, 5, 6, 21] neural networks[28, 22] on line learning[6, 15] and Support Vector Machines [12] While the rich literature provides valuable information about individual ....

....Joachims and our own version of kNN. 3. 2 kNN kNN stands for k nearest neighbor classification, a wellknown statistical approachwhich has been intensively studied in pattern recognition for over four decades[8] kNN has been applied to text categorization since the early stages of the research [17, 29, 11]. It is one of the the top performing methods on the benchmark Reuters corpus (the 21450 version, Apte set)# the other top performing methods include LLSF byYang, decision trees with boosting by Apte et al. and neural networks by Wiener et al. 2, 12, 14, 31, 28] The kNN algorithm is quite ....

[Article contains additional citation context not shown here]

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR ConferenceonResearch and Development in Information Retrieval (SIGIR'94), pages 13-- 22, 1994.


A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani   (24 citations)  Self-citation (Yang)   (Correct)

....to one and only one class; see Section3 for details) arg max c j Pr(c j ) Y t2d Pr(tjc j ) N(t;d) 2) 2.3.2. K Nearest Neighbor (kNN) kNN, an instance based classification method, has been an effective approach to a broad range of pattern recognition and text classification problems [8, 25, 26, 28]. In contrast to eager learning algorithms (including Naive Bayes) which have an explicit training phase before seeing any test document, kNN uses the training documents local to each test document to make its classification decision on that document. Our kNN uses the conventional vector space ....

Yiming Yang. Expert network: effective and efficient learning from human decisions in text categorisation and retrieval. In W. Bruce Croft and Cornelis J. van Rijsbergen, editors, Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 13--22, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.


A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani (2002)   (24 citations)  Self-citation (Yang)   (Correct)

....d, and taking the most probable class as the prediction: arg max c j Pr(c j ) Y t2d Pr(tjc j ) N(t;d) 2) 2.3.2. K Nearest Neighbor (kNN) kNN, an instance based classification method, has been an effective approach to a broad range of pattern recognition and text classification problems [8, 26, 27, 29]. In contrast to eager learning algorithms (including Naive Bayes) which have an explicit training phase before seeing any test document, kNN uses the training documents local to each test document to make its classification decision on that document. Our kNN uses the conventional vector space ....

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.


A Study on Thresholding Strategies for Text Categorization - Yang (2001)   (11 citations)  Self-citation (Yang)   (Correct)

....the recall and precision values for that category when these two values are radically different. 4. 2 Experimental Settings For the experiments in this paper, I used our standard kNN classifier for which detailed descriptions and the parameter setting process were reported in previous papers[14, 16, 18, 17]; stop word removal, stemming and statistical feature selection were applied to documents in a preprocessing step, using the methods described in those papers as well. All the parameters for the thresholding strategies (RCut, PCut, SCut and RTCut) were tuned using hold out validation sets ....

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.


A Re-Examination of Text Categorization Methods - Yang, Liu (1999)   (148 citations)  Self-citation (Yang)   (Correct)

....distribution over labelled documents. Topic spotting for newswire stories, for example, is one the most commonly investigated application domains in the TC literature. An increasing number of learning approaches have been applied, including regression models[9, 32] nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3] decision trees[9, 16, 20, 2, 12] inductive rule learning[1, 5, 6, 21] neural networks[28, 22] on line learning[6, 15] and Support Vector Machines [12] While the rich literature provides valuable information about individual ....

....2 and our own version of kNN. 3. 2 kNN kNN stands for k nearest neighbor classification, a wellknown statistical approach which has been intensively studied in pattern recognition for over four decades[8] kNN has been applied to text categorization since the early stages of the research [17, 29, 11]. It is one of the the top performing methods on the benchmark Reuters corpus (the 21450 version, Apte set) the other top performing methods include LLSF by Yang, decision trees with boosting by Apte et al. and neural networks by Wiener et al. 2, 12, 14, 31, 28] The kNN algorithm is quite ....

[Article contains additional citation context not shown here]

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.


Latent Semantic Indexing with a Variable Number of Orthogonal.. - Dupret (2004)   (Correct)

No context found.

Yang, Y. (2001), `Expert network: Effective and efficient learning from human decesions in text categorization and retrieval', Proc. of the 7 t h Annual International ACN-SIGIR Conference on Research and Development in Information Retrieval Dublin.


Use of K-Nearest Neighbor Classifier for Intrusion Detection - Liao, Vemuri (2002)   (Correct)

No context found.

Y. Yang, "Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval", Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 13-22, 1994.


Dynamic Organization of Search Results Using the UMLS - Pratt (1997)   (4 citations)  (Correct)

No context found.

Yang Y: Expert Network: effective and efficient learning from human decisions in text categorization and retrieval. SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Berlin, Germany. 1994:13-22.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC