| Yang, Y. (2001), `Expert network: Effective and efficient learning from human decesions in text categorization and retrieval', Proc. of the 7 t h Annual International ACN-SIGIR Conference on Research and Development in Information Retrieval Dublin. |
....Whereas an extensive range of methods have been applied to English text categorization, relatively few benchmarks have been done for Chinese text. Typical approaches to Chinese text categorization, such as Naive Bayes (NB) 1] Vector Space Model (VSM) 2, 3] and Linear List Square Fit (LLSF) [4, 5], have well studied theoretical basis derived from the information retrieval research, but are not known to be the best classifiers [6, 7] In addition, there is a lack of publicly available Chinese corpus for evaluating Chinese text categorization systems. This paper reports our comparative ....
Y. Yang, "Expert network: Effective and efficient learning from human decisions in text categorization and retrieval," in 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1994.
....and the set of system calls generated by a process is treated as the document .This analogy makes it possible to bring the full spectrum of well developed text processing methods [9] to bear on the intrusion detection problem. One such method is the k Nearest Neighbor classification method [10, 11]. The rest of this paper is organized as follows. In Section 2 we review some related work. Section 3 is a brief introduction to the kNN text categorization method. Section 4 describes details of our experiments with the 1998 DARPA data. We summarize our results in Section 5, and Section 6 ....
Y. Yang, "Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval", Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 13-22, 1994.
....people would analyze the contents of the document first. Therefore a large amount of human effort would be required. There has been some research work conducted on automatic text classification. One approach is to learn the text classifiers by using the machine learning techniques [1] 3] 4] [5]. However, these algorithms are based on a set of positive and negative training examples for learning the text classifiers. The quality of the resulting classifiers highly depends on the fitness of the training We shall use the terms Web pages and Web documents and documents interchangeably. ....
.... three alternatives for measuring the similarity between two document feature vectors based on the classical vector space model [22] The first measure is called cosine correlation which is commonly used to measure the similarity between the feature vectors of two documents (e.g. 1] 23] 3] [5]) A larger cosine value indicates that they are more similar. The second similarity measuring method uses the Euclidean distance, which is the most common similarity measuring method in high dimensional data clustering [8] 6] 7] The third alternative is the Manhattan distance. We have ....
Yiming Yang, Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 13-22, Dublin, Ireland, July, 1994
....becomes more important. Text categorization may be defined as the problem of assigning free text documents to predefined categories. A growing number of statistical learning methods have been applied to this problem in recent years, including regression models[7,27] nearest neighbor classifiers[11,14,25,26,28], Bayes belief networks[3,9,10,12,16,17,18,23] decision trees[2,9,12,18] neural networks[22,24] and inductive rule learning techniques[ 1,5,6,19] Recent studies have proved the success of statistical approaches for learning to classify text documents [3,10,16,20,26] These approaches commonly ....
Yang ,Y., Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13-22, 1994.
.... Integration of speech, language, and image processing: generating multimedia abstractions, segmenting video into stories, and tailoring presentations based on context [Wactlar96,99a, Christel97a,97b] Text processing: headline generation [Hauptmann97a] text clustering and topic classification [Yang94a,98a, Lafferty98, Hauptmann98b], and information retrieval from spoken documents [Hauptmann97b,97c,98c] Audio processing: speech recognition [Witbrock98a,98b] segmentation and alignment of spoken dialogue to existing transcripts [Hauptmann98a] and silence detection for better skim abstractions [Christel98] Image ....
Yang, Y. Expert network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Proc. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '94), 13-- 22, July 3-6, 1994. New Directions in Video Information Extraction and Summarization -- Carnegie Mellon University 7
....10 their frequency in a training corpus. 8 Our evaluation has shown that this categoriser did not perform as well for web pages as the KNN categoriser. The n gram categoriser is more useful in situations with noisy input, such as categorising OCR output. The k nearest neighbour algorithm (Yang, 1994) is a statistical algorithm which classifies a new document by combining the category assignments of the k most similar training documents, weighted by the similarity between the new document and each of the k best matching training documents. The categoriser has been trained with documents from ....
Yang, Y. (1994). Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. Paper presented at the 17th ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 13 -- 22. Dublin.
....ordered by their frequency in a training corpus. 7 Our evaluation has shown that this categoriser did not perform as well for web pages as the KNN categoriser. The n gram categoriser is more useful in situations with noisy input, such as categorising OCR output. The k nearest neighbour algorithm (Yang, 1994) is a statistical algorithm which classifies a new document by combining the category assignments of the k most similar training documents, weighted by the similarity between the new document and each of the k best matching training documents. The categoriser has been trained with documents from ....
Yang, Y. (1994). Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. Paper presented at the 17th ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 13 -- 22. Dublin.
....categorisation, which may be dubbed categorisation by context, since it exploits the context surrounding a link in an HTML document to extract useful information for categorising the document it refer to. This technique is complementary to the traditional technique of categorisation by content [Yang 94, Schtze 95, Ng 97] where information for categorising a document is extracted from the document itself. Such approach may exploit linguistic analysis to determine relevant portions of the text [Fuhr 91] and then exploits probabilistic or statistical analysis to perform feature selection [Yang ....
Yang, Y.: "Expert network: effective and efficient learning from human decisions in text categorisation and retrieval"; Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, IE (1994), 13--22.
....(the testset document are assigned to the classes proportionally to their size) Yang, 1999) 2.3. Text Classification Models Several classification models have been proposed in literature and their distinctive aspects will be here briefly summarized. KK Neighbor is an example based classifier, (Yang, 1994), making use of document to document similarity estimation that selects a class for a document through a kk nearest heuristics. Rocchio (Ittner et al. 1995; Cohen and Singer, 1996) often refers to TC systems based on the Rocchio s formula for profile estimation. RIPPER (Cohen and Singer, 1996) ....
....an RDS inference mechanism. This produces the best performance figure for linear classifiers on the Reuters3 data set (80.52 ) Other more accurate classifiers have been also presented. The best figure on the Reuters3 corpus is obtained by the example driven k NN classifier (85 ) and LLSF (85 ) (Yang, 1994). However, their higher training and classification complexity makes their design and use more difficult within real operational domains. Other classifiers based on complex learning (e.g. Ripper, Cohen and Singer, 1996) or knowledge based algorithms (e.g. SWAP 1, Apt et al. 1994) show a ....
Yang Y. (1994). Expert network: effective and efficient learning from human decisions in text categorisation and retrieval. In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 13--22, Dublin, IE.
....techniques that automatically generate text categorization knowledge based on training examples. Such techniques include decision trees [2] K nearest neighbor system (KNN) 7,19] rule induction [8] gradient descent neural networks [9, 16] regression models [18] Linear Least Square Fit (LLSF) [17], and support vector machines (SVM) 6, 7] All these statistical methods adopt a supervised learning paradigm. During the learning phase, a classifier derives categorization knowledge from a set of prelabeled or tagged documents. During the testing phase, the classifier makes prediction or ....
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings, 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1994.
....the use of automatic classification methods to supplement human effort in creating structured knowledge hierarchies. A wide range of statistical and machine learning techniques have been applied to text categorization, including multivariate regression models [8,22] nearest neighbor classifiers [26], probabilistic Bayesian models [13, 16] decision trees [16] neural networks [22, 25] symbolic rule learning [1, 4] and support vector machines [7,12] These approaches all depend on having some initial labeled training data from which category models are learned. Once category models are ....
Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1322, 1994.
....[LG94, LR94, Lew98, MN98] they have a major limitation due to the independence assumption they make while words in document data sets tend to be dependent. k nearest neighbor (k NN) classification is an instance based learning algorithm that has shown to be very effective in text classification [Yan94, CH98] The success of this scheme is due to the availability of effective similarity measures such as cosine measure [Sal89] However, the effectiveness of these similarity measures become worse as the number of words increases. PEBLS [CS93] is a k NN classification algorithm that incorporates ....
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In SIGIR-94, 1994.
....sources (general classification indexes, specialized thesauri, etc. Techniques for automatically deriving representations of categories ( category profile extraction ) and performing classification have been developed within the area of text categorization [Ittner 95, Lewis 96, Ng 97, Schtze 95, Yang 94, Yang 97] a discipline at the crossroads between information retrieval and machine learning. Text categorization uses machine learning techniques to inductively build representations of a given set of categories from a training set of documents pre categorized under them. An automatic process ....
....of these categories with the representation of a given document d in order to decide to which of these categories it belongs. Alternatively, a document can be compared to previously classified documents and placed in the categories where its most similar documents have also been placed [Yang 94] thus avoiding the need for the construction of explicit category profiles. Document representations are typically obtained through standard indexing techniques developed within the field of information retrieval [Salton 88] measures of similarity, such as the ones embodied in vector space ....
Yang, Y.: "Expert network: effective and efficient learning from human decisions in text categorisation and retrieval", Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, Dublin, IE, 13--22, 1994.
....extensive range of methods have been applied to English text categorization, relatively few have been benchmarked for Chinese text categorization. Typical approaches to Chinese text categorization, such as NaiveBayes (NB) 21] Vector Space Model (VSM) 22] 23] and Linear List Square Fit (LLSF) 2][17], havewell studied theoretical basis derived from the information retrieval research, but are not known to be the best classifiers [16] 18] In addition, there is a lackofpubliclyavailable Chinese corpus for evaluating Chinese text categorization systems. This paper reports our applications of ....
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings, 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 1994.
....Thus, designing automated procedures for news categorization becomes a practical as well as a challenging problem. A typical solution uses a training set to extract the features that characterize the individual news categories. The techniques employed include machine learning [2] 10] 13] 21] 23][25], statistical [18] 27] knowledge based [15] or the combinations [9] After the initial training stage, the news categorization system may apply periodic maintenance to avoid performance deterioration caused by the presence of new terms and new topics being discussed in the news articles. Also, ....
....learning methods in conjunction with a notion of differential misclassification cost to produce junk e mail filters. The reported work shows how to learn effective filters that eliminate a large portion of junk mail from a user s mail stream. The Expert Network (ExpNet) system developed by Yang [25] achieved good categorization performance on large documents. ExpNet is used in relevance ranking of candidate categories for an arbitrary text. In text retrieval applications, ExpNet is used for relevance ranking of documents via categories. A recent work [11] proposed the Generalized Instance ....
Y. Yang. "Expert network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval" Proceedings of SIGIR, pp. 13-22, 1994.
....class, people would analyze the contents of the document first. Therefore a large amount of human effort would be required. There has been some research work conducted on automatic text classification. One approach is to learn the text classifiers by using the machine learning techniques [9, 12, 18, 21]. However, these algorithms are based on a set of positive and negative training examples for learning the text classifiers. The quality of the resulting classifiers highly depends on the fitness of the training examples. There are many terms and classes in the World Wide Web (or just the Web) ....
.... We will consider three alternatives for measuring the similarity between two document feature vectors based on the classical vector space model [17] The first measure is called cosine correlation which is commonly used to measure the similarity between the feature vectors of two documents (e.g. [9, 10, 12, 21]) A larger cosine value indicates that they are more similar. The second similarity measuring method uses the Euclidean distance, which is the most common similarity measuring method in high dimensional data clustering [6, 14, 24] The third alternative is the Manhattan distance. Given two ....
Yiming Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 13--22, Dublin, Ireland, July 1994.
No context found.
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR ConferenceonResearch and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.
....distribution over labelled documents. Topic spotting for newswire stories, for example, is one the most commonly investigated application domains in the TC literature. An increasing number of learning approaches havebeenapplied, including regression models[9, 32] nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3] decision trees[9, 16, 20, 2, 12] inductive rule learning[1, 5, 6, 21] neural networks[28, 22] on line learning[6, 15] and Support Vector Machines [12] While the rich literature provides valuable information about individual ....
....Joachims and our own version of kNN. 3. 2 kNN kNN stands for k nearest neighbor classification, a wellknown statistical approachwhich has been intensively studied in pattern recognition for over four decades[8] kNN has been applied to text categorization since the early stages of the research [17, 29, 11]. It is one of the the top performing methods on the benchmark Reuters corpus (the 21450 version, Apte set)# the other top performing methods include LLSF byYang, decision trees with boosting by Apte et al. and neural networks by Wiener et al. 2, 12, 14, 31, 28] The kNN algorithm is quite ....
[Article contains additional citation context not shown here]
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR ConferenceonResearch and Development in Information Retrieval (SIGIR'94), pages 13-- 22, 1994.
....to one and only one class; see Section3 for details) arg max c j Pr(c j ) Y t2d Pr(tjc j ) N(t;d) 2) 2.3.2. K Nearest Neighbor (kNN) kNN, an instance based classification method, has been an effective approach to a broad range of pattern recognition and text classification problems [8, 25, 26, 28]. In contrast to eager learning algorithms (including Naive Bayes) which have an explicit training phase before seeing any test document, kNN uses the training documents local to each test document to make its classification decision on that document. Our kNN uses the conventional vector space ....
Yiming Yang. Expert network: effective and efficient learning from human decisions in text categorisation and retrieval. In W. Bruce Croft and Cornelis J. van Rijsbergen, editors, Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 13--22, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.
....d, and taking the most probable class as the prediction: arg max c j Pr(c j ) Y t2d Pr(tjc j ) N(t;d) 2) 2.3.2. K Nearest Neighbor (kNN) kNN, an instance based classification method, has been an effective approach to a broad range of pattern recognition and text classification problems [8, 26, 27, 29]. In contrast to eager learning algorithms (including Naive Bayes) which have an explicit training phase before seeing any test document, kNN uses the training documents local to each test document to make its classification decision on that document. Our kNN uses the conventional vector space ....
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.
....the recall and precision values for that category when these two values are radically different. 4. 2 Experimental Settings For the experiments in this paper, I used our standard kNN classifier for which detailed descriptions and the parameter setting process were reported in previous papers[14, 16, 18, 17]; stop word removal, stemming and statistical feature selection were applied to documents in a preprocessing step, using the methods described in those papers as well. All the parameters for the thresholding strategies (RCut, PCut, SCut and RTCut) were tuned using hold out validation sets ....
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.
....distribution over labelled documents. Topic spotting for newswire stories, for example, is one the most commonly investigated application domains in the TC literature. An increasing number of learning approaches have been applied, including regression models[9, 32] nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3] decision trees[9, 16, 20, 2, 12] inductive rule learning[1, 5, 6, 21] neural networks[28, 22] on line learning[6, 15] and Support Vector Machines [12] While the rich literature provides valuable information about individual ....
....2 and our own version of kNN. 3. 2 kNN kNN stands for k nearest neighbor classification, a wellknown statistical approach which has been intensively studied in pattern recognition for over four decades[8] kNN has been applied to text categorization since the early stages of the research [17, 29, 11]. It is one of the the top performing methods on the benchmark Reuters corpus (the 21450 version, Apte set) the other top performing methods include LLSF by Yang, decision trees with boosting by Apte et al. and neural networks by Wiener et al. 2, 12, 14, 31, 28] The kNN algorithm is quite ....
[Article contains additional citation context not shown here]
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13--22, 1994.
No context found.
Yang, Y. (2001), `Expert network: Effective and efficient learning from human decesions in text categorization and retrieval', Proc. of the 7 t h Annual International ACN-SIGIR Conference on Research and Development in Information Retrieval Dublin.
No context found.
Y. Yang, "Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval", Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 13-22, 1994.
No context found.
Yang Y: Expert Network: effective and efficient learning from human decisions in text categorization and retrieval. SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Berlin, Germany. 1994:13-22.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC