| Schuler, G. D., Epstein, J. A., Ohkawa, H. & Kans, J. A. (1996), `Entrez: Molecular biology database and retrieval system', Methods in Enzymology 266, 141-- 162. |
....entire databases of protein sequences. The construction of this TPR domain hunter involves multiple data sources and procedures that are typical of many bioinformatics problems. First in order to extract previously known TPR domains, the ability to access and manipulate GenPept reports from Entrez [12] located at the National Center for Biotechnology Information (NCBI) in Washington DC is needed. Second, in order to recognize new TPR domains, multiple sequence comparison programs such as BLAST [2] WU BLAST2.0 [1] and HMMER [13] must be accessed. Third, in order to present results properly, ....
Schuler, G.D., Epstein, J.A., Ohkawa, H., and Kans, J.A., Entrez: Molecular biology database and retrieval system, Methods in Enzymology, 266:141--162, 1996.
.... of a GenPept report is the following complex type 1 discussed in [23] #uid:num, #title:string, #accession:string, #feature: #name:string, #start:num, #end:num, #anno: #annoname:string, #descr:string) Kleisli provides several functions to access GenPept reports remotely from Entrez [19]: aa get uid general, which retrieves unique identifiers of GenPept reports given a search string; aa get seqfeat general, which retrieves GenPept reports given a search string; aa get seqfeat by uid, which retrieves the GenPept report corresponding to a given unique identifier; and so on. The ....
G. D. Schuler et al. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141-- 162, 1996.
....used. The aim of the evaluation is to ascertain whether the tools offer the semantic and syntactic flexibility needed in a general bioinformatics query tool. The sequence retrieval service 2 (SRS) Etzold et al. 1996] is a general query system for flat file databanks and analysis tools. Entrez [Schuler et al. 1996] offers query facilities over a set of biological data repositories. Hence, both are reasonable targets for assessment using these evaluation principles. Other systems, such as Imagene [Medigue et al. 1999] and GCG [Devereux et al. 1984] would also make suitable targets for such evaluations. ....
Schuler, G., Epstein, J., Ohkawa, H., and Kans, J. (1996). Entrez: Molecular Biology Database and Retrieval System. Methods in Enzymology, 266:141--162.
....sources and translate their replies into Kleisli s exchange format. The version of Kleisli that forms the backbone of the ConnectivityEngine tm of GeneticXchange Inc. www.geneticXchange.com) contains over sixty drivers for many popular bioinformatics systems, including Sybase, Oracle, Entrez [27], WU BLAST2 [1] Gapped BLAST [3] ACEDB [31] etc. The optimizer of Kleisli can also be customized by different rules and strategies. When a query is submitted to Kleisli, it is first processed by the CPL Module which translates it into an equivalent expression in the abstract calculus NRC. NRC ....
....and so on. The feature table of a GenPept report is the part of the GenPept report that documents the positions of these regions of special biological interest, as well as annotations or comments on these regions. The following type represents the feature table of a GenPept report from Entrez [27]. #uid:num, #title:string, #accession:string, #feature: #name:string, #start:num, #end:num, #anno: #annoname:string, #descr:string) It is an interesting type because one of its fields (#feature) is a set of records, one of whose fields (#anno) is in turn a list of records. More ....
[Article contains additional citation context not shown here]
G. D. Schuler et al. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141--162, 1996.
....requests to these sources and translate their replies into Kleisli s exchange format. The version that forms the backbone of the ConnectivityEngine tm of Kris Technology Inc. www.kris inc.com) contains over sixty drivers for all popular bioinformatics systems, including Sybase, Oracle, Entrez [27], WU BLAST2 [1] Gapped BLAST [2] ACEDB [33] etc. Also, the optimizer of Kleisli can be customized by different rules and strategies. When a query is submitted to Kleisli, it is first processed by the CPL Module which translates it into an equivalent expression in NRC. The abstract calculus NRC ....
....and so on. The feature table of a GenPept report is the part of the GenPept report that documents the positions of these regions of special biological interest, as well as annotations or comments on these regions. The following type represents the feature table of a GenPept report from Entrez [27]. #uid:num, #title:string, #accession:string, #feature: #name:string, #start:num, #end:num, #anno: #annoname:string, #descr:string) It is an interesting type because it is a record of set of lists of records. Here is the detail. It is a record of four fields #uid, #title, #accession, ....
[Article contains additional citation context not shown here]
G. D. Schuler et. al. Entrez: Molecular biology database and retrieval system. Methods Enzymology, 266:141--162, 1996.
....serve as an example. Historic development and organizational obstacles have prevented the definition and proliferation of standards, leaving end users confronted with an overwhelming diversity in data formats, query languages and access methods. Several proprietary systems like SRS [4] and Entrez [15] have been developed for the integration and distribution of molecular and genomic data. A Biologist nowadays has to find an access method to the desired data source, typically on the WWW. Then he has to understand and use the interface, e.g. by typing in keywords in a form, and finally he has to ....
Schuler G.D., Epstein J.A., et al.: Entrez: Molecular Biology Databases and Retrieval System. Methods in Enzymology, Vol. 266. Academic Press (1996) 141-162
....into a variant type. However the Kleisli Data Exchange Format (described below) does not insist on sets, bags, and lists being homogeneous. As an example of a complex type expressible in this data model, consider the following type that represents the feature table of a GenPept report from Entrez [16, 20]. #uid: num, #title: string, #accession: string, #feature: #name: string, #start: num, #end: num, #anno: #annoname: string, #descr: string) It has an interesting type because it is a record of set of lists of records. Here is the detail. It is a record of four fields #uid, #title, ....
....requests to these sources and translate their replies into Kleisli s exchange format. The version that forms the backbone of the ConnectivityEngine tm of Kris Technology Inc. www.kris inc.com) contains over sixty drivers for all popular bioinformatics systems, including Sybase, Oracle, Entrez [20], WU BLAST2 [1] Gapped BLAST [2] ACEDB [21] etc. Also, the optimizer of Kleisli can be customized by different rules and strategies. When a query is submitted to Kleisli, it is first processed by the CPL Module which translates it into an equivalent expression in NRC, an abstract nested ....
[Article contains additional citation context not shown here]
G. D. Schuler, et. al. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141--162, 1996.
....As observed by Baker and Brass [1] many existing biology data retrieval systems[2, 3, etc. were not fully up to the demand of flexible and painless bioinformatics data integration. These systems relied on low level direct manipulation by biologists. The archetypal example was the Entrez system[2]. Here a biologist used a keyword to extract summary records, then clicked on each record to view its contents or to perform operations. This worked well for simple actions. However, as the number of actions or records increased, such direct manipulation quickly became a repetitive drudgery. Also, ....
....from a large public sequence database such as GenBank [11] Firstly, such a database had a fairly complicated structure. Secondly, feature annotations needed for correct identification of exons were buried within such a complicated structure. Thirdly, available public resources such as Entrez [2] did not provide a convenient means for extracting these buried feature annotations. Kleisli possessed all necessary attributes to address this problem. After we showed how Kleisli could be used to extract and derive codon usage of DNA sequences from Entrez in a simple way, we also showed how it ....
[Article contains additional citation context not shown here]
Schuler GD, et. al. (1996) Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266, 141--162.
....idea (such as identifying various domains and active sites) than the overall function of our protein, even if we have some alignments that extend over the whole of our protein, some tedious work is still needed. What does the tedious work involve At the very least, it means going to Entrez [15] to fetch the GenPept report associated with each interesting homolog, so that we can inspect the feature table in this report to see if the aligned regions fall within any interesting feature or domain annotated in the feature table. Then, we have to copy these regions to a file and perform a ....
G. D. Schuler, J. A. Epstein, H. Ohkawa, J. A. Kans. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141--162, 1996.
....data sources by adding new drivers, which forward Kleisli s requests to these sources and translate their replies into Kleisli s exchange format. The installation at Kent Ridge Digital Labs contains over sixty drivers for all popular bioinformatics systems, including Sybase, Oracle, Entrez (Schuler et al. 1996), WU BLAST2 (Altschul Gish, 1996) Gapped BLAST (Altschul et al. 1997) ACEDB (Walsh et al. 1998) etc. Furthermore, the optimizer of Kleisli can be customized by adding different rules and strategies. When a query is submitted to Kleisli, it is first processed by the CPL Module which ....
....and so on. The feature table of a GenPept report is the part of the GenPept report that documents the positions of these regions of special biological interest, as well as annotations or comments on these regions. The following type represents the feature table of a GenPept report from Entrez (Schuler et al. 1996). #uid: num, #title: string, #accession: string, #feature: #name: string, #start: num, #end: num, #anno: #annoname: string, #descr: string) It is an interesting type because it is a record of set of lists of records. Here is the detail. It is a record of four fields #uid, #title, ....
[Article contains additional citation context not shown here]
Schuler, G. D., Epstein, J. A., Ohkawa, H., & Kans, J. A. (1996). Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266, 141--162.
....quickly tie together many databases and systems to answer interesting queries that are more demanding than simple minded free text search. In this case, the following databases and systems at the National Center for Biotechnology Information (NCBI) are involved: 1) the protein section of Entrez [6]; 2) the taxonomy database ( www.ncbi.nlm.nih.gov Taxonomy) and (3) MEDLINE ( www.ncbi.nlm.nih.gov PubMed) Currently, the system handles just this query: Find those articles about SUBJ of organisms in the same CAT as ORG, focus especially on those that have associated protein sequences. The ....
G. D. Schuler, J. A. Epstein, H. Ohkawa, and J. A. Kans. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141-- 162, 1996.
....query protein patents. This system uses Kleisli to tie together the following sources to answer queries on protein patents that are considerably more demanding than simple free text search: 1) the protein section of the Entrez system at the National Center for Biotechnology Information [11]; 2) the BLAST sequence homology service at the the National Center for Biotechnology Information [2] 3) the WU BLAST2 sequence homology software from Washington University [1] 4) the Isite system at the US Patent and Trademark Office (http: patents.uspto.gov) and (5) the structural ....
....; Figure 2: Kleisli CPL program to find unpatented sequences in the same superfamily as a user supplied sequence. to operate on, we will have a means to reliably identify which of our sequences have not yet been patented. We obtain the patented sequences from the protein section of Entrez [11]. These are warehoused locally for greater efficiency. We use WU BLAST2 [1] for comparing our sequences against this warehouse for primary sequence structure homology. After the unpatented protein sequences have been identified, the second question at this point is: Which ones of these have the ....
G. D. Schuler, J. A. Epstein, H. Ohkawa, and J. A. Kans. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141--162, 1996.
....entire databases of protein sequences. The construction of this TPR domain hunter involves multiple data sources and procedures that are typical of many bioinformatics problems. First in order to extract previously known TPR domains, the ability to access and manipulate GenPept reports from Entrez [12] located at the National Center for Biotechnology Information (NCBI) in Washington DC is needed. Second, in order to recognize new TPR domains, multiple sequence comparison programs such as BLAST [2] WU BLAST2.0 [1] and HMMER [13] must be accessed. Third, in order to present results properly, ....
G. D. Schuler, J. A. Epstein, H. Ohkawa, J. A. Kans. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141--162, 1996.
....problems. We use for illustration here the energy related genes of the model dicot Arabidopsis, whose entire genome is currently being sequenced [16] On 25 April 1998, one of us used the specification energy AND arabidopsis[Organism] to search the GenBank portion of the popular Entrez system [22] at the National Center for Biotechnology Information (NCBI) in Washington DC to get these energetic Arabidopsis genomic sequences. One needed to get hold of them before one could proceed to extract and analyse their Kozak sequences. It returned a list of exactly one record. There were actually ....
....In order to mix objects of different types in a set, bag, or list, it is necessary to inject these objects into a variant type. We will not be using variants here. As an example of a complex type, consider the following type that represents the feature table of a GenBank report from Entrez [22]. #uid: num, #title: string, #accession: string, #feature: #name: string, #continuous: bool, #position: #accn: string, #start: num, #end: num, #negative: bool, #anno: #annoname: string, #descr: string) It has an interesting type because it is a record of set of lists of records. ....
[Article contains additional citation context not shown here]
G. D. Schuler, J. A. Epstein, H. Ohkawa, and J. A. Kans. Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266:141--162, 1996.
....As observed by Baker and Brass [1] many existing biology data retrieval systems[2, 3, etc. were not fully up to the demand of flexible and painless bioinformatics data integration. These systems relied on low level direct manipulation by biologists. The archetypal example was the Entrez system[2]. Here a biologist used a keyword to extract summary records, then clicked on each record to view its contents or to perform operations. This worked well for simple actions. However, as the number of actions or records increased, such direct manipulation quickly became a repetitive drudgery. Also, ....
....DNA sequences were needed and the ability to compare them was also needed. Unfortunately, that database did not keep actual DNA sequences. The actual DNA sequences were kept in another database called GenBank[13] At the time, access to GenBank was provided through the ASN.1 version of Entrez[2], which was a rather complicated retrieval system. Entrez also kept precomputed homologs of GenBank sequences. So this query required the integration of GDB (a relational database) and Entrez (a non relational database ) that first extracted names of genes on the required cytogenetic band and ....
[Article contains additional citation context not shown here]
Schuler GD, Epstein JA, Ohkawa H, and Kans JA (1996) Entrez: Molecular biology database and retrieval system. Methods in Enzymology, 266, 141--162.
....This simple example serves as a quick introduction to the basic syntax of CPL. The second example asks what other protein sequences in the same superfamily of a given protein sequence have been patented. These examples exercise many aspects of Kleisli and involves integration across Entrez[2], SCOP[9] WU BLAST2[10] patents, proteins, feature tables, etc. I hope the succintness of these examples is sufficient illustration of the power, flexibility, and simplicity of the Kleisli system. Example: Proline at N Terminal The first of our two examples is this query: What proportion of ....
Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: Molecular biology database and retrieval system. Methods in Enzymology 1996; 266:141--162.
No context found.
Schuler, G. D., Epstein, J. A., Ohkawa, H. & Kans, J. A. (1996), `Entrez: Molecular biology database and retrieval system', Methods in Enzymology 266, 141-- 162.
No context found.
G. D. Schuler et al. Entrez: Molecular biology database and retrieval system. Methods Enzymol., 266:141--162, 1996.
No context found.
G. D. Schuler et al. Entrez: Molecular biology database and retrieval system. Methods Enzymol., 266:141--162, 1996.
No context found.
G. D. Schuler, J. A. Epstein, H. Ohkawa, and J. A. Kans, "Entrez: molecular biology database and retrieval system," Methods Enzymol, vol. 266, pp. 141-62, 1996.
No context found.
Schuler,G.D., Epstein,J.A., Ohkawa,H. and Kans,J.A. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol., 266, 141--162.
No context found.
Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol 1996;266:141-62.
No context found.
Schuler, G.D., et al., Entrez: molecular biology database and retrieval system. Methods Enzymol, 1996. 266: p. 141-62.
No context found.
Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol 1996;266:141-62.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC