#### DMCA

## Infosift: adapting graph mining techniques for document classification (2004)

Citations: | 3 - 0 self |

### Citations

13212 | Statistical Learning Theory - Vapnik - 1998 |

2301 | Text Categorization with Support Vector Machines: Learning with Many Relevant Features
- Joachims
- 1997
(Show Context)
Citation Context ...eserved. mining techniques for classification (Aery & Chakravarthy 2004). Related Work For text classification, a number of approaches have been proposed, these include Support Vector Machines (SVM) (=-=Joachims 1998-=-), Decision trees (Apte, Damerau, & Weiss 1998; Joachims 1998), k-Nearest-Neighbor (k-NN) classification (Lam & Ho 1998; Masand, Linoff, & Waltz 1992; Yang 1994), Linear Least Squares Fit technique (Y... |

1273 | Fast effective rule induction - Cohen - 1995 |

1023 | A Comparison of Event Models for Naive Bayes Text Classification
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ...n (Apte, F.Damerau, & Weiss 1994; Cohen 1995; Moulinier, Raskinis, & Ganascia 1996), neural networks (Weiner, Pederson, & Weigend 1995; Ng, Goh, & Low 1997) and Bayesian probabilistic classification (=-=McCallum & Nigam 1992-=-; Baker & McCallum 1998; Koller & Sahami 1997; Tzeras & Hartman 1993). Support Vector Machines work by constructing a hyperplane that separates positive and negative examples of a class. A decision tr... |

890 | The CN2 induction algorithm - Clark, Niblett - 1989 |

852 | A re-examination of text categorization methods - Yang, Liu - 1999 |

646 | gSpan: graph-based substructure pattern mining
- Yan, Han
- 2002
(Show Context)
Citation Context ...amochi & Karypis 2001) maps the apriori algorithm to structural data represented in the form of a labeled graph and finds frequent itemsets that correspond to recurring subgraphs in the input. gSpan (=-=Yan & Han 2002-=-) uses a canonical representation by mapping each graph to a code and uses depthfirst search to mine frequent subgraphs. Briefly, our work requires a means of substructure discovery directed by specif... |

544 | A bayesian approach to filtering junk E-mail, Learning for Text Categorization - Sahami, Dumais, et al. - 1998 |

473 | Knowlege Discovery in Databases: An Overview. - Frawley, Piatetsky-Shapiro, et al. - 1991 |

451 | Email overload: Exploring personal information management of email - Whittaker, Sidner - 1996 |

406 | Frequent subgraph discovery.
- Kuramochi, Karypis
- 2001
(Show Context)
Citation Context ...am search to discover interesting and repetitive subgraphs, and compresses the original graph with instances of the discovered substructures. The frequent subgraphs approach by Kuramochi and Karypis (=-=Kuramochi & Karypis 2001-=-) maps the apriori algorithm to structural data represented in the form of a labeled graph and finds frequent itemsets that correspond to recurring subgraphs in the input. gSpan (Yan & Han 2002) uses ... |

237 | Developments in automatic text retrieval,” - Salton - 1991 |

198 | Learning rules that classify e-mail. - Cohen - 1996 |

198 | Substructure discovery using minimum description length and background knowledge - Cook, Holder - 1994 |

189 | Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. - Yang - 1994 |

174 | A graph distance metric based on the maximal common subgraph,
- Bunke, Shearer
- 1998
(Show Context)
Citation Context ...an extension of the k-NN algorithm is used to handle graph based data. The graph theoreticaldistance measure for computing the distance translates to the maximal common subgraph distance proposed in (=-=Bunke & Shearer 2001-=-). A graph-encoded linguistic scheme has been applied for text classification in (Gee & Cook 2005). Contribution The main contribution of this paper is in the adaptation of a novel, but powerful appro... |

144 | Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization. - Ng, Goh, et al. - 1997 |

144 | MailCat: An intelligent assistant for organizing e-mail. - Segal, Kephart - 1999 |

138 |
C.G.: An Example-based Mapping Method for Text Categorization and Retrieval.
- Yang, Chute
- 1994
(Show Context)
Citation Context ...8), Decision trees (Apte, Damerau, & Weiss 1998; Joachims 1998), k-Nearest-Neighbor (k-NN) classification (Lam & Ho 1998; Masand, Linoff, & Waltz 1992; Yang 1994), Linear Least Squares Fit technique (=-=Yang & Chute 1994-=-), rule induction (Apte, F.Damerau, & Weiss 1994; Cohen 1995; Moulinier, Raskinis, & Ganascia 1996), neural networks (Weiner, Pederson, & Weigend 1995; Ng, Goh, & Low 1997) and Bayesian probabilistic ... |

107 | Classifying news stories using memory based reasoning. - Masand, Linoff, et al. - 1992 |

106 |
Inexact graph matching for structural pattern recognition
- Bunke, Allermann
- 1983
(Show Context)
Citation Context ...both for representation and identification. In order to detect substructures that match inexactly or vary slightly in their edge or vertex descriptions, the algorithm developed by Bunke and Allerman (=-=Bunke & Allerman 1983-=-) is used where each distortion (addition, deletion or substitution of vertices or edges) is assigned a cost. The two graphs are said to be isomorphic as long as the cost difference falls within the u... |

96 | Concept features in re:Agent, an intelligent email agent. - Boone - 1998 |

93 | Towards language independent automated learning of text categorization models, in: - Apte, Damerau, et al. - 1994 |

76 |
Using a Generalized Instance Set for Automatic Text Categorization.
- Lam, Ho
- 1998
(Show Context)
Citation Context ...r of approaches have been proposed, these include Support Vector Machines (SVM) (Joachims 1998), Decision trees (Apte, Damerau, & Weiss 1998; Joachims 1998), k-Nearest-Neighbor (k-NN) classification (=-=Lam & Ho 1998-=-; Masand, Linoff, & Waltz 1992; Yang 1994), Linear Least Squares Fit technique (Yang & Chute 1994), rule induction (Apte, F.Damerau, & Weiss 1994; Cohen 1995; Moulinier, Raskinis, & Ganascia 1996), ne... |

67 | Megainduction: A test flight - Catlett - 1991 |

64 | Automatic indexing based on bayesian inference networks.
- Tzeras, Artman
- 1993
(Show Context)
Citation Context ... Ganascia 1996), neural networks (Weiner, Pederson, & Weigend 1995; Ng, Goh, & Low 1997) and Bayesian probabilistic classification (McCallum & Nigam 1992; Baker & McCallum 1998; Koller & Sahami 1997; =-=Tzeras & Hartman 1993-=-). Support Vector Machines work by constructing a hyperplane that separates positive and negative examples of a class. A decision tree makes recursive splits based on discriminating attributes to dete... |

62 | Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface. - Payne, Edwards - 1997 |

60 | Text categorization and relational learning.
- Cohen
- 1995
(Show Context)
Citation Context ...Nearest-Neighbor (k-NN) classification (Lam & Ho 1998; Masand, Linoff, & Waltz 1992; Yang 1994), Linear Least Squares Fit technique (Yang & Chute 1994), rule induction (Apte, F.Damerau, & Weiss 1994; =-=Cohen 1995-=-; Moulinier, Raskinis, & Ganascia 1996), neural networks (Weiner, Pederson, & Weigend 1995; Ng, Goh, & Low 1997) and Bayesian probabilistic classification (McCallum & Nigam 1992; Baker & McCallum 1998... |

51 | Text categorization: a symbolic approach." - Moulinier, Raskinis - 1996 |

34 | Mining Association Rules Between Sets of Items in Large Datasets," - Agarwal, Imielinski, et al. - 1993 |

31 | Challenges of the email domain for text classification. - Brutlag, Meek - 2000 |

24 | Automatic induction of rules for e-mail classification - Crawford, Kay, et al. - 2001 |

21 |
Stochastic Complexity in Statistical Enquiry
- Rissanen
- 1989
(Show Context)
Citation Context ...ok & Holder 2000) is a graph based mining algorithm. It accepts a forest as input and identifies the best subgraph that minimizes the input graph using the minimum description length (MDL) principle (=-=Rissanen 1989-=-). It outputs subgraphs of different sizes and their occurrence frequency in the input graph. Subdue is capable of identifying both exact and inexact (or isomorphic) substructures in the input graph. ... |

19 | Ishmail: Immediate identification of important information - Helfman, Isbell - 1995 |

18 | Text mining with decision trees and decision rule. Proceeding of the automated learning and discovery conference. - Apte, Damerau, et al. - 1998 |

18 | Classification of web documents using a graph model.
- Schenker, Last, et al.
- 2003
(Show Context)
Citation Context ... to the task at hand and we have chosen graph data mining for our work as we intend to extract patterns occurrences instead of word occurrences. Graph models have been used to classify web documents (=-=Schenker et al. 2003-=-), but an extension of the k-NN algorithm is used to handle graph based data. The graph theoreticaldistance measure for computing the distance translates to the maximal common subgraph distance propos... |

13 |
Distributed Clustering of Words for Text Categorization
- Baker, McCallum
- 1998
(Show Context)
Citation Context ...eiss 1994; Cohen 1995; Moulinier, Raskinis, & Ganascia 1996), neural networks (Weiner, Pederson, & Weigend 1995; Ng, Goh, & Low 1997) and Bayesian probabilistic classification (McCallum & Nigam 1992; =-=Baker & McCallum 1998-=-; Koller & Sahami 1997; Tzeras & Hartman 1993). Support Vector Machines work by constructing a hyperplane that separates positive and negative examples of a class. A decision tree makes recursive spli... |

11 | A neural network approach to topic spotting - Weiner, Pedersen, et al. - 1995 |

11 | Context-sensitive methods for text categorization - Cohen, Singer - 1996 |

8 |
Graph based data mining
- Cook, Holder
- 2000
(Show Context)
Citation Context ...erent structure in the document is preserved, but graph mining approaches ar required for analysis. Relevant work in graph mining includes the Subdue substructure discovery system by Cook and Holder (=-=Cook & Holder 2000-=-). Subdue employs beam search to discover interesting and repetitive subgraphs, and compresses the original graph with instances of the discovered substructures. The frequent subgraphs approach by Kur... |

4 | M.Rennie, “ifile:an application of machine learning to e-mail filtering,” - D - 2000 |

3 | Is learning bias an issus on the text categorization problem - Molinier - 1997 |

3 |
Heirarchically classifying text using very few words
- Koller, Sahami
- 1997
(Show Context)
Citation Context ...Moulinier, Raskinis, & Ganascia 1996), neural networks (Weiner, Pederson, & Weigend 1995; Ng, Goh, & Low 1997) and Bayesian probabilistic classification (McCallum & Nigam 1992; Baker & McCallum 1998; =-=Koller & Sahami 1997-=-; Tzeras & Hartman 1993). Support Vector Machines work by constructing a hyperplane that separates positive and negative examples of a class. A decision tree makes recursive splits based on discrimina... |

3 | Automatic we page categorization by link and context analysis - Attardi, Gulli, et al. - 1999 |

1 | emailsift: Mining based approaches to email classification - Aery, Chakravarthy - 2004 |

1 | Dec 2004. InfoSift: Adapting Graph Mining Techniques for Document Classification - Aery |

1 |
Text classification using graph-encoded lingusitic elements
- Gee, Cook
- 2005
(Show Context)
Citation Context ...easure for computing the distance translates to the maximal common subgraph distance proposed in (Bunke & Shearer 2001). A graph-encoded linguistic scheme has been applied for text classification in (=-=Gee & Cook 2005-=-). Contribution The main contribution of this paper is in the adaptation of a novel, but powerful approach, viz. graph mining for text classification and demonstrating its effectiveness. To the best o... |