Machine Learning for Information Extraction from Online Documents (1996)
| Citations: | 5 - 1 self |
BibTeX
@TECHREPORT{Freitag96machinelearning,
author = {Dayne Freitag},
title = {Machine Learning for Information Extraction from Online Documents},
institution = {},
year = {1996}
}
Years of Citing Articles
OpenURL
Abstract
Introduction The experiment described here was designed for two things: to test the feasibility of a learning approach to information extraction in a real-world domain, and to uncover evidence that by using multiple learners it is possible to achieve better performance than by using a single learner. Because the documents used in this experiment are taken unmodified from a real online environment designed for human-to-human communication, the task is a challenging one. Its difficulty varies considerably from field to field, but in all cases, in order to conclude that this approach is feasible, I require of each learner that its performance is substantially better than that of a random guesser. Of course, in practice the required performance level is defined by the intended application. Consequently, my argument for feasibility is informal. Some applications may be able to exploit a well-behaved precision-recall curve, so I look for this from the learners tested here. We cannot







