Results 1 - 10
of
265
Identifying relevant data for a biological database: Handcrafted rules versus machine learning
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
"... Abstract — With well over one thousand specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss- ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both
An Algorithm that Learns What's in a Name
, 1999
"... In this paper, we present IdentiFinder^TM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. We have evaluated the model in English (based on data from the Sixth and Seventh Message Understanding Conferences [MUC-6, MUC-7] and broadcast news) ..."
Abstract
-
Cited by 372 (7 self)
- Add to MetaCart
better than reported by any other learning algorithm. IdentiFinder's performance is competitive with approaches based on handcrafted rules on mixed case text and superior on text where case information is not available. We also present a controlled experiment showing the effect of training set size
Text Chunking by Combining Hand-Crafted Rules and Memory-Based
, 2003
"... This paper proposes a hybrid of handcrafted rules and a machine learning method for chunking Korean. In the partially free word-order languages such as Korean and Japanese, a small number of rules dominate the performance due to their well-developed postpositions and endings. Thus, the propos ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper proposes a hybrid of handcrafted rules and a machine learning method for chunking Korean. In the partially free word-order languages such as Korean and Japanese, a small number of rules dominate the performance due to their well-developed postpositions and endings. Thus
Hand-crafted versus Machine-learned Inflectional Rules: The Euroling-SiteSeeker Stemmer and CST's Lemmatiser
"... The Euroling stemmer is developed for a commercial web site and intranet search engine called SiteSeeker. SiteSeeker is basically used in the Swedish domain but to some extent also for the English domain. CST’s lemmatiser comes from the Center for Language Technology, University of Copenhagen and wa ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
and was originally developed as a research prototype to create lemmatisation rules from training data. In this paper we compare the performance of the stemmer that uses handcrafted rules for Swedish, Danish and Norwegian as well one stemmer for Greek with CST’s lemmatiser that uses training data to extract
Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation
- Proceedings of the ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing
, 1996
"... This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology-specifically agglutinative lan- ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology-specifically agglutinative lan-
Automatically Learned vs. Hand-crafted Text Analysis Rules
, 1997
"... As vast quantities of on-line text become available, there is an increasing need for systems that automatically analyze the conceptual content of natural language text. Systems that operate on narrowly de#ned domains show promise, but require a di#erent set of domainspeci #c rules for each app ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
application. This paper describes CRYSTAL, a system that learns text analysis rules automatically from examples. Rules induced byCRYSTAL achieve performance approaching that of hand-crafted rules. CRYSTAL has a particularly e#cient learning algorithm that is not improved by more extensive search
Adaptive Fraud Detection
- DATA MINING AND KNOWLEDGE DISCOVERY
, 1997
"... One method for detecting fraud is to check for suspicious changes in user behavior. This paper describes the automatic design of user profiling methods for the purpose of fraud detection, using a series of data mining techniques. Specifically, we use a rule-learning program to uncover indicators of ..."
Abstract
-
Cited by 221 (19 self)
- Add to MetaCart
One method for detecting fraud is to check for suspicious changes in user behavior. This paper describes the automatic design of user profiling methods for the purpose of fraud detection, using a series of data mining techniques. Specifically, we use a rule-learning program to uncover indicators
A Maximum Entropy Approach to Identifying Sentence Boundaries
- In Proceedings of the Fifth Conference on Applied Natural Language Processing
, 1997
"... We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lex ..."
Abstract
-
Cited by 209 (3 self)
- Add to MetaCart
We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules
Using hand-crafted rules and machine learning to infer SciXML document structure
"... SciXML is designed to represent the standard hierarchical structure of scientific articles and represents a candidate common document representation framework for text-mining. Such a framework can greatly facilitate interoperability of text-mining tools. However, no publisher actually generates SciX ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
XML. We describe a new framework for inferring SciXML from a presentational level of description, such as PDF, using general purpose components such as Optical Character Recognition and expert hand-coded rules and then using supervised machine learning to provide the per-journal adaptation required
ÚFAL: Using Hand-crafted Rules in Aspect Based Sentiment Analysis on Parsed Data
"... This paper describes our submission to Se-mEval 2014 Task 41 (aspect based senti-ment analysis). The current work is based on the assumption that it could be advan-tageous to connect the subtasks into one workflow, not necessarily following their given order. We took part in all four sub-tasks (aspe ..."
Abstract
- Add to MetaCart
This paper describes our submission to Se-mEval 2014 Task 41 (aspect based senti-ment analysis). The current work is based on the assumption that it could be advan-tageous to connect the subtasks into one workflow, not necessarily following their given order. We took part in all four sub-tasks (aspect term extraction, aspect term
Results 1 - 10
of
265