MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  and Nguyen Hung Son

Download:
Download as a PDF | Download as a PS
by Hui Wang
http://alfa.mimuw.edu.pl/logic/prace/1999/E32/tclm2.ps
Add To MetaCart

Abstract:

Abstract. A novel approach to supervised learning, called Lattice Machine, was proposed in [5]. In the Lattice Machine, it was assumed that data are structured as relations. In this paper we investigate the application of the Lattice Machine in the area of text classification, where textual data are unstructured. We represent a set of textual documents as a collection of Boolean feature vectors, where each vector corresponds to one document and each entry in a tuple indicates whether a particular term appears in the document. This is a common representation of textual documents. We show that using this representation, the Lattice Machine's operations are simply set theoretic operations. In particular, the lattice sum operation is simply set intersection and the ordering relationship is simply set inclusion. Experiments show that the Lattice Machine, under this configuration, is quite competitive with state-of-the-art learning algorithms for text classification. 1

Citations

3214 C4.5: Programs for Machine Learning – Quinlan - 1993
981 An algorithm for suffix stripping – Porter - 1997
620 Fast effective rule induction – Cohen - 1995
58 Corpus-based stemming using cooccurrence of word variants – Xu, Croft - 1998
42 Joins that generalize: text classification using Whirl – Cohen, Hirsch - 1998
9 Data Reduction Based on Hyper Relations – Wang, Düntsch, et al. - 1998