Before doing something with the list of Id's from the previous post and to make things a lot easier, let's turn the code that we have developed so far into a REBOL script file. Open your favorite plain text editor and enter (or copy and paste) the following:
REBOL [
Title: "Search the NCBI Protein Database"
Date: 24-Apr-2008
File: %esearch-01.r
Author: "Peter C Marks"
Version: 1
]
Title: "Search the NCBI Protein Database"
Date: 24-Apr-2008
File: %esearch-01.r
Author: "Peter C Marks"
Version: 1
]
Explanation:
- This is called the "header" of a REBOL script file. It must be there. This example shows some of the information fields that can be included in the header. Please consult the REBOL documentation for more information. Minimally, you can get away with just this "REBOL []" but, of course, it's better to have some documentation.
eutilities-url: http://www.ncbi.nlm.nih.gov/entrez/eutils/
esearch-arguments: “esearch.fcgi?db=protein&term=inulin”
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
print response
esearch-arguments: “esearch.fcgi?db=protein&term=inulin”
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
print response
Copy the this and append it to the header text - after the "]". This is the actual script that will be executed. Save this as a file with the name esearch-01.r Remember what directory/folder you stored this file; you'll need the path later on.
Start the REBOL command window if you haven't already. We are going to execute this script file. There are two ways of doing this:
- by positioning ourselves to the directory where the script is located or
- by specifying the location of the script file.
change-dir %c/languages/rebol
or, if you're running some version Linux, Mac OS X, or Unix:
change-dir %/home/pcmarks/languages/rebol.
Explanation:
- The current directory is changed to the specified directory. Notice that the argument starts with a % This indicates that a file/directory name follows.
do %esearch-01.r
Explanation:
- The do command will attempt to execute the REBOL code in the file. You can also use a URL and other values as arguments.
do %/home/pcmarks/languages/rebol/esearch-01.r
By the way, at the DOS or shell command prompt (not the REBOL command prompt), you can type the following:
rebol esearch-01.rebol
and the script will be executed - assuming that your system can find the rebol executable.
I usually keep a text editor open, make changes, save and execute the script at the REBOL prompt. Also, as with most command lines, you can touch the up-arrow to recall previous commands from a history of commands.
In the last post, part of the result from the search was a list of NCBI Id's that were relevant to our search for the protein inulin. As a next step, we'd like to select an Id from that list and see what type of information it points to. Here's what the list portion of the response looked like:
...
<IdList>
<Id> 2507051 </Id>
<Id> 72132980 </Id>
<Id> 1110443 </Id>
<Id> 12060499 </Id>
<Id> 9963676 </Id>
<Id> 1906792 </Id>
<Id> 119714336 </Id>
<Id> 169196951 </Id>
<Id> 169175440 </Id>
<Id> 169175430 </Id>
<Id> 169175429 </Id>
<Id> 169090591 </Id>
<Id> 169016425 </Id>
<Id> 169016415 </Id>
<Id> 169016414 </Id>
<Id> 167362208 </Id>
<Id> 167070948 </Id>
<Id> 116668619 </Id>
<Id> 158318775 </Id>
<Id> 119534997 </Id>
</IdList>
...
<IdList>
<Id> 2507051 </Id>
<Id> 72132980 </Id>
<Id> 1110443 </Id>
<Id> 12060499 </Id>
<Id> 9963676 </Id>
<Id> 1906792 </Id>
<Id> 119714336 </Id>
<Id> 169196951 </Id>
<Id> 169175440 </Id>
<Id> 169175430 </Id>
<Id> 169175429 </Id>
<Id> 169090591 </Id>
<Id> 169016425 </Id>
<Id> 169016415 </Id>
<Id> 169016414 </Id>
<Id> 167362208 </Id>
<Id> 167070948 </Id>
<Id> 116668619 </Id>
<Id> 158318775 </Id>
<Id> 119534997 </Id>
</IdList>
...
The NCBI provides another CGI utility called efetch. Given an Id value it will return information about this resource. To use efetch we post a request the same way we did for esearch. We'll try it with the first Id in the list. Create a new text file in your editor, copy the header from the last script, change the values as necessary, and finally enter the following code and save it as efetch-01.r:
eutilities-url: http://www.ncbi.nlm.nih.gov/entrez/eutils/
esearch-arguments: "efetch.fcgi?db=protein&id=2507051"
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
print response
esearch-arguments: "efetch.fcgi?db=protein&id=2507051"
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
print response
Explanation:
- The difference between this script and our first is the second line: Instead of calling the esearch utility at the NCBI, we are calling the efetch utility. We need to tell it from what database to fetch/get information and the Id.
do %efetch-01.r
There will be a fairly long response. The beginning of the response should look this:
Seq-entry ::= seq {
id {
swissprot {
name "INU2_ARTGO" ,
accession "P19870" ,
release "reviewed" ,
version 3 } ,
gi 2507051 } ,
descr {
title "Inulin fructotransferase [DFA-I-forming] (Inulin fructotransferase
[depolymerizing, difructofuranose-1,2':2',1-dianhydride-forming])." ,
sp {
class standard ,
seqref {
gi 1110442 ,
gi 1110443 ,
gi 2127394 } ,
...
id {
swissprot {
name "INU2_ARTGO" ,
accession "P19870" ,
release "reviewed" ,
version 3 } ,
gi 2507051 } ,
descr {
title "Inulin fructotransferase [DFA-I-forming] (Inulin fructotransferase
[depolymerizing, difructofuranose-1,2':2',1-dianhydride-forming])." ,
sp {
class standard ,
seqref {
gi 1110442 ,
gi 1110443 ,
gi 2127394 } ,
...
Briefly, the response says that this data is <Sequence> entry from the <Swissprot> database - another large database available over the web. Notice that this response is not an XML document. Instead it is formatted using an ISO standard called ASN.1 We won't worry about this right now. What is important is that we were able to take an Id value from the list in our original response and give it to another NCBI utility, efetch, and have it return information about the protein inulin. (Notice the title of this sequence entry in the response.)

0 comments:
Post a Comment