The two REBOL scripts that we have built require the user to change the script in order to search for a different protein. This is not too user-friendly. What we want to do first is to prompt the user for the protein name. REBOL has a function named ask that will do this. Type in:
protein-name: ask "Please enter a protein name: "
print protein-name
print protein-name
REBOL also has another function that prompts a user to enter text. It is named request-text. Try this:
protein-name: request-text "Please enter a protein name:"
print protein-name
print protein-name
The obvious difference between the two functions is that request-text uses a Graphical User Interface (GUI). We'll continue using a GUI for interactions with a user.
Let's incorporate request-text into the esearch-01.erl script (new or changed lines in bold):
eutilities-url:
http://www.ncbi.nlm.nih.gov/entrez/eutils/
protein-name: request-text "Please enter a protein name"
esearch-arguments: join esearch.fcgi?db=protein&term=” protein-name
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
print response
http://www.ncbi.nlm.nih.gov/entrez/eutils/
protein-name: request-text "Please enter a protein name"
esearch-arguments: join esearch.fcgi?db=protein&term=” protein-name
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
print response
Explanation:
- The second line is new. It prompts the user for a protein name and stores it in the protein-name variable.
- In the third line, the hardwired protein Id is replace by the protein-name variable.
We also have to modify the second script, efetch-01.r, to include the Id of the information we were interested in. Instead of having to modify the script, a better user interface would be to create a list of the Id's and have the user pick one of the Id's. As it turns out REBOL has a request-list function. Try this:
request-list "Choose a number, any number: " [2 4 "six" 8 10]
Explanation:
- request-list takes two arguments: a string that becomes a prompt and a block. Blocks are similar to lists, a type that is often available in other programming languages. Blocks are surrounded by square brackets "[ ]" when displayed or when typed in as literals. Just like strings, blocks are also a series type as well.
- Notice the "six" in the list. The elements in a list need not be all of the same type. That is, you can mix numbers and character strings, for example.
At this point I was going to illustrate how to use REBOL's extensive list of search functions to locate and extract the entire list of Id's. The code would have been straight-forward, similar to solutions in other languages and it would have introduced you to the series type search functions. But I decided that I wanted to show you a very unique built-in facility in REBOL called parsing. While perhaps a conceptually more difficult way to extract our list of Ids, I think that once you understand what's going on you'll appreciate how powerful parsing can be.
For a very good tutorial on using the parsing facility, take a look at the website of Nick Antonaccio. He has written about many other areas of REBOL as well.
Parsing, for those who are not familiar with the term, is essentially scanning a character string trying to identify textual units that obey the rules specified in a grammar. There are many tools that will take a set of grammatical rules, say for a language, and generate a program that can parse programs written in that language.
Add the following REBOL code to the esearch-01.r script file. The additional code is shown in bold.
ids: []
REBOL [
Title: "Search the NCBI Protein Database"
Date: 24-Apr-2008
File: %esearch-01.r
Author: "Peter C Marks"
Version: 1
]
eutilities-url: http://www.ncbi.nlm.nih.gov/entrez/eutils/
protein-name: request-text "Please enter a protein name"
esearch-arguments: join "esearch.fcgi?db=protein&term=" protein-name
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
ids: []
parse response [
thru <idlist>
some [
thru <id> copy idvalue to </id> (append ids idvalue)]
to </idlist>
to end ]
Explanation:REBOL [
Title: "Search the NCBI Protein Database"
Date: 24-Apr-2008
File: %esearch-01.r
Author: "Peter C Marks"
Version: 1
]
eutilities-url: http://www.ncbi.nlm.nih.gov/entrez/eutils/
protein-name: request-text "Please enter a protein name"
esearch-arguments: join "esearch.fcgi?db=protein&term=" protein-name
esearch-url: join eutilities-url esearch-arguments
response: load/markup esearch-url
ids: []
parse response [
thru <idlist>
some [
thru <id> copy idvalue to </id> (append ids idvalue)]
to </idlist>
to end ]
- The variable ids is initialized to an empty block "[]"
- Parse is a built in function that can parse a series (string, block, etc.) - in our example a block, response - according to a set of grammatical rules.
- Here is an English translation of the parse lines in the script: Search the variable response for and thru the token/tag <IdList>. Next there will be some number of grammatical elements. Each element is identified by first scanning for and thru the <Id> tag. After doing this, copy the following characters into the variable idvalue up to an </Id> tag. After the copy append to the ids list the value in idvalue. When there are no more (some) id's remaining continue scanning to the tag </IdList> and finally scanning from here until the end of the response value is reached.
print ids
The value of ids should look something like this:
["2507051" "72132980" "1110443" "12060499" "9963676" "1906792" "116668619" "119714
336" "169196951" "169175440" "169175430" "1691...
336" "169196951" "169175440" "169175430" "1691...
Parsing has found and extracted the list of Id's and assigned them to the ids variable. We can use the ids variable as an argument to the request-list function.
Incidently, this style of parsing is very similar to what is called Parsing expression grammars (PEG). Basically, a PEG is a program that is directly based on the grammar (syntactic rules) of a language, for example. That is the grammar becomes a parsing program.
Enter the following line:
request-list "Choose an Id" ids
A popup dialog should appear like the image below.

Click on one of the Id's. The value you clicked will be printed in the REBOL console. What this means is that the value of the function request-list is the value of the selection or if the Cancel button is pushed, the value "none".
We now need to take this selected value and use it as an argument to the NCBI efetch utility. I'll cover that in the next post.
