| Craven, M., & Kumlien, J. (1999). Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (pp. 77--86). Heidelberg, Germany: AAAI Press. |
....(NIST) 42] is focused on identifying various social, action role, part of, and locational relations between named entities. Several projects have focused on extracting relations from biomedical text, such as identifying gene disease relations, subcellular localizations, or protein interactions [12,14,15,2 6,43]. This section discusses our work on identifying human protein interactions assuming that the proteins themselves have already been tagged, and shows that machine learning systems out perform human written extraction rules. 4.1 IE Methods 4.1.1 Rapier and Boosted Wrapper Induction In order to ....
....data, better learning methods, and better use of external knowledge. Larger training sets are always bene cial to learning systems; however, manually tagging data is very time consuming. One alternative approach is to use existing knowledge to automatically produce weakly labeled training data [12]. Another approach is to use active learning to select only the best training examples for human labeling [51] A third approach is to utilize a mixture of both labeled and unlabeled data during training [52] Improved learning algorithms for information extraction continue to be developed. ....
M. Craven, J. Kumlien, Constructing biological knowledge bases by extracting information from text sources, in: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany, 1999, pp. 77-86.
....of free text is a challenging research task [11, 12] which has an immediate and pressing application in improving access to the large volumes of knowledge published on line. The relevance of ILP to extracting a machineprocessable semantic representation from free text, e.g. MEDLINE abstracts [3] and database queries [11] and from semi structured webpages [4, 7] has been demonstrated. Applications range from more intelligent information retrieval, to the construction of knowledge bases and knowledge discovery [3] To this list we add the task of marking up web documents with ground ....
....semantic representation from free text, e.g. MEDLINE abstracts [3] and database queries [11] and from semi structured webpages [4, 7] has been demonstrated. Applications range from more intelligent information retrieval, to the construction of knowledge bases and knowledge discovery [3]. To this list we add the task of marking up web documents with ground relation instances (RDF triples) that is needed for the Semantic Web and the intelligent tools it promises. This paper explores the problem of learning information extraction rules that accurately derive ground facts ....
[Article contains additional citation context not shown here]
Craven, M. and Kumlien, J. Constructing biological knowledge bases by extracting information from text sources. Proc. 7th International Conference on Intelligent Systems for Molecular Biology 1999.
....do not allow for divergence of opinion; and do not readily permit updating or reinterpretation of previously entered information. Information extraction (IE) methods can be used to at least partially overcome these limitations by automatically extracting information from biomedical text [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. Most previous biomedical IE systems have been applied to abstracts. Abstracts are a natural source of information, as they are readily available, and dense in information. A second information dense part of scientific publications are the figures and accompanying captions . In most genres of ....
Craven, M. and Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB99) . AAAI Press, 1999, pp 77--86.
....but is performed automatically by a computer. Like reading, most literature mining projects target a specific goal. In bioinformatics, examples are: Finding protein protein interactions [a.o. 2, 20, 31] Finding protein gene interactions [26] Finding subcellular localization of proteins [5, 6], Functional annotation of proteins [1, 23] Pathway discovery [10, 18] Vocabulary construction [19, 24] Assisting BLAST search with evidence found in literature [3] Discovering gene functions and relations [27] With this wide variety of goals, it s not surprising that many ....
Craven M, Kumlien J: Constructing biological knowledge bases by extracting information from text sources. Proc Int Conf Intell Syst Mol Biol 1999:77-86.
....themselves and class relations are forced into disjoint relations as far as possible. The markup of text, while conforming to SGML, makes no use of an explicit ontology and relatively little use of meta data. Recent IE projects have looked beyond news to the molec ular biology domain, e.g. 4] [5]. Some projects have implicitly incorporated simple taxonomies (is a hierarchy) into the annotation guidelines for domain experts. To the best of our knowledge these projects still largely ignore explicit properties of classes and class relations that could be contained in the ontology and their ....
M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent $ystemps for Molecular Biology (I$MB-99), Heidelburg, Germany, August 6 10 1999.
....themselves and class relations are forced into disjoint relations as far as possible. The markup of text, while conforming to SGML, makes no use of an explicit ontology and relatively little use of meta data. Recent IE projects have looked beyond news to the molecular biology domain, e.g. 4] [5]. Some projects have implicitly incorporated simple taxonomies (is a hierarchy) into the annotation guidelines for domain experts. To the best of our knowledge these projects still largely ignore explicit properties of classes and class relations that could be contained in the ontology and their ....
M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systemps for Molecular Biology (ISMB-99), Heidelburg, Germany, August 6--10 1999.
....and medicine and contains more than nine million articles. With the rapid growth in the number of published papers in the eld of molecular biology there has been growing interest in the application of information extraction, Sekimizu et al. 1998) Collier et al. 1999) Thomas et al. 1999)(Craven and Kumlien, 1999), to help solve some of the problems that are associated with information overload. In the remainder of this paper we will rst of all outline the background to the task and then describe the basics of HMMs and the formal model we are using. The following sections give an outline of a new tagged ....
M. Craven and J. Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systemps for Molecular Biology (ISMB99) , Heidelburg, Germany, August 6-10.
....the user does not have to explicitly provide the extraction slots and their POS tags separately from advice since they can be extracted from the advice rules. For example, in one of our case studies, we want to extract names of proteins and their subcellular locations from the yeast database of Craven and Kumlien (1999). One of our advice rules for this task is: When the phrase ProteinName Nphrase . Vphrase LocationName Nphrase appears in the document then score it very high. The variables ProteinName and LocationName represent the protein names and their subcellular structures. The Nphrase trailing ....
....discuss an experiment where we train only one network to extract fillers for two slots simultaneously. 3. 2 Yeast Protein Localization Domain For our second experiment, the task is to extract protein names and their subcellular localization from the yeast protein localization database produced by Craven and Kumlien (1999). They call their extraction template the subcellular localization relation and created their yeast database by first collecting 1213 instances of the subcellular localization relation from the Yeast Protein Database (YPD) Web site. Then they collected abstracts from 924 articles in the PubMed ....
[Article contains additional citation context not shown here]
Craven, M., & Kumlien, J. (1999). Constructing biological knowledge-bases by extracting information from text sources. Proc. of ISMB99 (pp. 77--86). Heidelberg.
....relevant phrases and useful facts in text. It is intended to assist in #nding documents about a given gene, or about the relationships between speci#c genes. Leek #1997# suggests a way of using hidden Markov models #hmm#s for extracting sentences discussing gene positions on chromosomes from text. Craven and Kumlien #1999# introduce a method for transforming #at text documents into databases of facts about relationships between genes#proteins, performing a task similar to the one Leek addresses, without the need to obtain an hmm for discovering these relationships. Rind#esch et al. #2000# present a method based ....
Craven, M., and Kumlien, J. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the AAAI Conference on Intelligent Systems in Molecular Biology, 77#86.
.... shrinkage to get around the problem of not having su#cient labeled examples. Seymore, et al. 1999) also use HMMs to extract information from on line text. They get around 38 the problem of not having su#cient training data by using data that is labeled for another purpose in their system. Craven and Kumlien (1999) use a sentence classifier to extract instances of a binary relation from text databases in molecular biology. To reduce the need for labeled training examples, they use weakly labeled training data. They use a text classifier to label sentences in a document. If a sentence is classified as ....
Craven, M. and J. Kumlien: 1999, `Constructing biological knowledge-bases by extracting information from text sources'. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology. Heidelberg, Germany, pp. 77--86.
....increase accuracy by using such additional information as lists of common names to give a priori probabilities to each of the candidate extractions. RELATED WORK Only a few researchers have explored the connection between information retrieval and information extraction. Craven and Kumlien [4] use a sentence classifier to extract instances of a binary relation from text databases in molecular biology. The naive Bayes algorithm with a bagof words representation [5] is used to classify sentences. A sentence is classified as a positive example if it contains at least one instance of the ....
....from text databases in molecular biology. The naive Bayes algorithm with a bagof words representation [5] is used to classify sentences. A sentence is classified as a positive example if it contains at least one instance of the target relation. WAWA s information extraction system is similar to [4] in that we use a text classifier to extract information. However, in [4] the text classifier classifies small chunks of text (such as sentences) and extracts only the words that are in the given semantic lexicons. In WAWA, the text classifier is able to classify a text document as a whole and ....
[Article contains additional citation context not shown here]
Craven, M., & Kumlien, J. Constructing biological knowledge bases by extracting information from text sources, Proc. 7 th Intl. Conf. on Intelligent. Systems for Mol. Biol., 1999.
....natural language translation (at least, very rough cut translation) is now a reality witness for example the widespread use of Altavista s Babel sh. Machine learning techniques are aiding in the construction of information extraction engines that ll database entries from document abstracts (e.g. [3]) and from web pages (e.g. WhizBang Labs, http: www.whizbanglabs.com) NLP became a major application focus for ILP in particular with the ESPRIT project ILP 2 . Indeed, as early as 1998 the majority of the application papers at the ILP conference were on NLP tasks. A third popular and ....
....certain assumptions again simple, standard methods exist. For example, under one set of assumptions linear regression can be used, and under another naive Bayes can be used. In fact, the work by Srinivasan and Camacho [35] on predicting levels of mutagenicity and the work by Craven and colleagues [4, 3] on information extraction can be seen as special cases of this proposed approach, employing linear regression and naive Bayes, respectively. While the approach just outlined appears promising, of course it is not the only possible approach and may not turn out to be the best. More generally, ILP ....
[Article contains additional citation context not shown here]
M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany, 1999. AAAI Press.
....phrases and useful facts in text. It is intended to assist in finding documents about a given gene, or about the relationships between specific genes. Leek (1997) suggests a way of using hidden Markov models (hmm)s for extracting sentences discussing gene positions on chromosomes from text. Craven and Kumlien (1999) introduce a method for transforming flat text documents into databases of facts about relationships between genes proteins, performing a task similar to the one Leek addresses, without the need to obtain an hmm for discovering these relationships. Rindflesch et al. 2000) present a method based ....
Craven, M., and Kumlien, J. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the AAAI Conference on Intelligent Systems in Molecular Biology, 77--86.
....been directed at disambiguating extant natural language [4] There have been many computational attempts to extract, and to some extent, understand phrasal semantics as part of projects to extract information from corpora of texts. This area has recently blossomed for biologically related texts [5, 6, 8, 14, 21, 31, 44, 47, 59, 61, 67, 72]. The semantic constructs are too coarsely grained for computational exchange and require further resolution, presumably because they are simply taken as they are in the natural language corpora. Conversely, the semantics of terms and constructs is explicitly defined and composed in the case of ....
Craven, M. and J. Kumlien, 1999. Constructing biological knowledge bases by extracting information from text sources. In Lengauer et al. [45], pages 77--86.
....[Riloff 1996b; Riloff and Lehnert 1994] are learned from a set of labeled training examples and are used to extract words and phrases from sentences. 7.3.6 Using Text Categorizers to Extract Patterns Only a few researches have investigated using text classi ers to solve the extraction problem. Craven and Kumlien (1999) use a sentence classi er to extract instances of a binary relation from text databases in molecular biology (e.g. MEDLINE) Speci cally, if r(X; Y ) is the target binary relation, then the goal is to nd instances x and y where x and y are in the semantic lexicons of X and Y respectively. For ....
....if r(X; Y ) is the target binary relation, then the goal is to nd instances x and y where x and y are in the semantic lexicons of X and Y respectively. For example, the binary relation cell localization(protein, cell type) describes the cell types in which a particular protein is located. Craven and Kumlien (1999) use the naive Bayes algorithm with a bag of words representation [Mitchell 1997] to classify sentences. A sentence is classi ed as a positive example if it contains at least one instance of the target relation. Wawa s information extraction system is similar to Craven and Kumlien s system in ....
[Article contains additional citation context not shown here]
M. Craven and J. Kumlien (1999). Constructing biological knowledge-bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany, 77-86.
No context found.
Craven, M., & Kumlien, J. (1999). Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (pp. 77--86). Heidelberg, Germany: AAAI Press.
No context found.
Mark Craven and Johan Kumlien. Constructing biological knowledge bases by extracting information from text sources. ISMB 1999.
No context found.
Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. Proc Int Conf Intell Syst Mol Biol 1999:77--86.
No context found.
Craven, M. and J. Kumlien, Constructing Biological Knowledge Bases by Extracting Information from Text Sources. 1999: p. 77-86.
No context found.
Craven, M. & Kumlien, J. (1999), Constructing biological knowledge-bases by extracting information from text sources, in `Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology', Germany, pp. 77--86.
No context found.
Craven, M. & Kumlien, J. (1999), Constructing biological knowledge-bases by extracting information from text sources, in `Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology ', Germany, pp. 77--86.
No context found.
CRAVEN, M. -- KUMILIEN, J. 1999. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proceedings of The 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99).
No context found.
Craven, M., and Kumlien, J. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of ISMB.
No context found.
CRAVEN, M. -- KUMILIEN, J. 1999. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proceedings of The 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99).
No context found.
M. Craven and J. Kumlien. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In Proceedings of The 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99), 1999.
No context found.
M. Craven and J. Kumlien. 1999. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proceedings of The 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99).
No context found.
M. Craven and J. Kumlien. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proc. of the 7th International Conference on Intelligent Systems in Molecular Biology (ISMB-99), 1999.
No context found.
M. Craven and J. Kumlien. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proc. 7th International Conference on Intelligent Systems in Molecular Biology (ISMB-99), 1999.
No context found.
Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. Proc Int Conf Intell Syst Mol Biol. 1999;:7786.
No context found.
Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. Proc Int Conf Intell Syst Mol Biol 1999:77--86.
No context found.
M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systemps for Molecular Biology (ISMB-99), Heidelburg, Germany, August 6-10 1999.
No context found.
Craven, M. and Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99). AAAI Press, 1999, pp. 77--86.
No context found.
M. Craven, J. Kumlien, Constructing biological knowledge bases by extracting information from text sources, in: Proc. of the 7th Intl. Conf. on Intelligent Systems for Molecular Biology, Heidelberg, Germany, 1999, pp. 77-86.
No context found.
M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99), pages 77--86. AAAI Press, 1999.
No context found.
Craven M., and Kumlien J. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the International Conference on Intelligent Systems for Molecular Biology (ISMB-99).
No context found.
M. Craven and J. Kumlien. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. Proc. of the 7th International Conference on Intelligent Systems in Molecular Biology (ISMB-99), 1999.
No context found.
M. Craven, and J Kumlien, "Constructing Biological knowledge Bases by Extracting information from Text Sources" Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, 7786, Heidelberg, Germany. AAAI Press.
No context found.
Craven, Mark and Kumlien, Johan. Constructing Biological Knowledge Bases by Extracting information from Text Sources. In In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology 1999. Due to space limitations, we refer the reader to www.medstract.org for an analysis and discussion of the false positive results from the present experiment.
No context found.
M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proc. of the 7th Intl. Conference on Intelligent Systems for Molecular Biology, pages 7786. Heidelberg, Germany, August 6-10, 1999.
No context found.
Craven, Mark and Kumlien, Johan. Constructing Biological Knowledge Bases by Extracting information from Text Sources. In In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology 1999. CDue to space limitations, we refer the reader to wmi.medatract .org for an analysis and discussion of the false positive results from the present experiment.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC