Results 1 - 10
of
17
Managing Content with Automatic Document Classification
- Journal of Digital Information
, 2004
"... News articles and web directories represent some of the most popular and commonly accessed content on the web. Information designers normally define categories that model these knowledge domains (i.e. news topics or web categories) and domain experts assign documents to these categories. This paper ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
News articles and web directories represent some of the most popular and commonly accessed content on the web. Information designers normally define categories that model these knowledge domains (i.e. news topics or web categories) and domain experts assign documents to these categories. This paper describes how machine learning and automatic document classification techniques can be used for managing large numbers of news articles, or web page descriptions, lightening the load on domain experts. In this paper we use two datasets, one with with more than 800,000 Reuters news stories and another with over 41,000 web sites, and classify them using a Naive Bayes algorithm, into predefined categories. We discuss the di#erent parameters and design decisions that normally appear when building automatic classifiers, including, stemming, stopwords, thresholding, amount of data and approaches for improving performance using the structure in XML documents. The methodology developed would enable web based applications or workflow systems to manage information more e#- ciently, i.e. by assigning documents to topics automatically or assisting humans in the process of doing so.
User Scenarios for the design and implementation of iLMS
- In Proceedings of the AIED 2003 Workshop Towards Intelligent Learning Management Systems, 2003
"... The aim of this article is to discuss possible user scenarios for “intelligent ” Learning Management Systems (iLMS) and challenges for implementing them. We focus on those scenarios in which Machine Learning (ML) can be used to enhance general purpose web-based Learning Management Systems. We will p ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
The aim of this article is to discuss possible user scenarios for “intelligent ” Learning Management Systems (iLMS) and challenges for implementing them. We focus on those scenarios in which Machine Learning (ML) can be used to enhance general purpose web-based Learning Management Systems. We will propose a software engineering framework for the design and implementation of an iLMS. 1.
Combining ILP with Semi-supervised Learning for Web Page Categorization
"... Abstract—This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong lea ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervised Naive Bayes, which is the well-known algorithm for the text classification problem. The performance of our learning algorithm is also compare with other semi-supervised learning algorithms which are Co-Training and EM. The experimental results show that ICT algorithm outperforms those algorithms and the performance of the weak learner can be enhanced by ILP system.
Classification Methods for Structured Outputs
, 2007
"... This paper is conceived as a summary of previous works that explored many alternative learning models for classification of documents on structured outputs. We provide a discussion of strong and weak points for each method. ..."
Abstract
- Add to MetaCart
This paper is conceived as a summary of previous works that explored many alternative learning models for classification of documents on structured outputs. We provide a discussion of strong and weak points for each method.
The Australian Stock Exchange Limited (ASX-
"... This paper compares the performance of several machine learning algorithms for the automatic categorization of corporate announcements in the Australian Stock Exchange (ASX) Signal G data stream. The article also describes some of the applications that the categorization of corporate announcements m ..."
Abstract
- Add to MetaCart
(Show Context)
This paper compares the performance of several machine learning algorithms for the automatic categorization of corporate announcements in the Australian Stock Exchange (ASX) Signal G data stream. The article also describes some of the applications that the categorization of corporate announcements may enable. We have performed tests on two categorization tasks: market sensitivity, which indicates whether an announcement will have an impact on the market, and report type, which classifies each announcement into one of the report categories defined by the ASX. We have tried Neural Networks, a Naïve Bayes classifier, and Support Vector Machines and achieved good results.
Document Classification into a Structured Organization of Classes: P.h.D. Research Proposal. Keywords: Hierarchical document classification, Supervised-unsupervised classification,
"... From the early ’90s a lot of work has been made in document classification tasks. The effectiveness of many studies has dramatically improved thanks to the introduction of Machine Learning methods into the Text Classification community. The main purpose of this work is to approach both unsupervised ..."
Abstract
- Add to MetaCart
(Show Context)
From the early ’90s a lot of work has been made in document classification tasks. The effectiveness of many studies has dramatically improved thanks to the introduction of Machine Learning methods into the Text Classification community. The main purpose of this work is to approach both unsupervised and supervised document classification problems. The unsupervised step starts when the structured organization of classes is empty and no documents have been classified in advance. The second step regards a supervised problem, since the process learns from documents pre-classified in the first step. Many existing classifiers ignore the structure of the classes. The main idea of this proposal instead is to exploit the a-priori knowledge encoded in a structured organization of classes, described both by linguistic labels denoting the “meaning ” of the nodes and by relationships between them. 1
Online News Headlines AUTOMATIC CLASSIFICATION OF ONLINE NEWS HEADLINES
, 2007
"... The rise of online news over the past decade has altered how individuals obtain news and this study sought to determine the types of online news headlines most often selected by news websites as their “Top Stories”. Headlines from four news websites were downloaded using Really Simple Syndication (R ..."
Abstract
- Add to MetaCart
(Show Context)
The rise of online news over the past decade has altered how individuals obtain news and this study sought to determine the types of online news headlines most often selected by news websites as their “Top Stories”. Headlines from four news websites were downloaded using Really Simple Syndication (RSS) feeds. Supervised learning was conducted with the downloaded headlines to develop models which could automatically classify each website’s “Top Story ” headlines, whose specific news category was unknown. “Top Story ” headlines were also matched to headlines with known news categories from the same period to determine which news categories were most often represented as “Top Stories”. The results show that some news categories ’ headlines, particularly those that had unique terms, were classified correctly based on the text contained in the headline. Headlines from World and US/UK news categories were most often represented as Top Story headlines, followed by Business, Politics, and Entertainment.
Automatic Categorization of Questions for a Mathematics Education Service
, 2003
"... This paper describes a new approach to managing a stream of questions about mathematics by integrating a text categorization framework into a relational database management system. The corpus studied is based on unstructured submissions to an ask-an-expert service in learning mathematics. The classi ..."
Abstract
- Add to MetaCart
(Show Context)
This paper describes a new approach to managing a stream of questions about mathematics by integrating a text categorization framework into a relational database management system. The corpus studied is based on unstructured submissions to an ask-an-expert service in learning mathematics. The classification system has been tested using a Nave Bayes learner built into the framework. The performance results of the classifier are also discussed. The framework was integrated into a PostgreSQL database through the use of procedural trigger functions.