[This link and this link are REBOL tutorials written by Nick Antonaccio. They are among the best that I have come across. I consult them constantly as I learn REBOL and I highly recommend them.]
So far we have created two scripts to perform the two tasks of searching and then fetching protein information from the NCBI Protein database. We need to tie them together so that the results of the search can be used to fetch data. Rather than continue to develop the two scripts we will turn each into a function and place these two functions into a single script file. Function definition and use is an important structural construct in REBOL as it is in most languages.
REBOL functions look and operate very much like functions in other languages. They accept arguments, compute and return a value. However REBOL functions share a characteristic in common with languages like LISP, Scheme, and others in that they are considered "first class citizens". This means that functions can be treated like other data types. REBOL functions and data are both constructed out of blocks. As a result, functions can be manipulated like any other data structure.
Here's how we create a function that doubles a number:
Explanation:
- A function is created by using the word func, followed by a block of arguments (if any) followed by the code to be executed in the function.
- The string in the first position of the first block is an optional comment.
- The function is then assigned to the variable double which becomes its name.
- The data type of the variable double is function! That is, if you enter type? double at the REBOL console, it will return the word function!
The script file that we will create will contain two function definitions and some REBOL code to execute immediately upon "doing" it. In addition, to make the functions more generally useful, we'll remove any user interface interactions; these will become part of the executable code.
Here's what the esearch script becomes after we turn it into a function:
esearch: func [
"Search for references to a protein name and return a list of ids or none"
protein-name
] [
; assemble the cgi arguments
esearch-arguments: join "esearch.fcgi?db=protein&term=" protein-name
esearch-url: join eutilities-url esearch-arguments
; load as a tagged (XML, HTML, etc.) document
response: load/markup esearch-url
; Parse the response
ids: []
clear ids
parse response [
thru
some [
thru
(append ids idvalue)]
to
to end ]
; Return the id list
ids]
And, here's what the efetch script becomes:
efetch: func [
"Fetch the NCBI data associated with this reference Id according to the
return type"
id [string!] "A valid NCBI reference Id"
return-type [string!] "For example, fasta"
] [
; Assemble the CGI parameters - note that we use the function argument
; named return-type as a value for the NCBI fetch argument named rettyp
esearch-arguments:
rejoin ["efetch.fcgi?db=protein&retmode=text&rettype="
return-type
"&id="
id]
esearch-url: join eutilities-url esearch-arguments
response: read/lines esearch-url
response]
Finally, the code that will be executed when we "do" the script:
; URL shared by the esearch and efetch functions
eutilities-url: http://www.ncbi.nlm.nih.gov/entrez/eutils/
;; Ask the user for a protein name
protein-name: request-text/title "Enter a protein name"
; search for references to this protein
ids: esearch protein-name
; let the user choose one of the references
id: request-list "Choose an ID" ids
if id = none [print "Halting the script" halt]
; use the id to fetch the reference - The return value is the fasta
; file represented as a block of strings - one per line
fasta: efetch id
print fasta
You can create a script file from the two functions and the executable bit or download ncbi.r (You might have to remove a ".txt" extension, although REBOL doesn't care: do %ncbi.r.txt will still work.)
To close this post, I'll introduce you to REBOL's GUI facilities - they are remarkably powerful and easy to use. By way of introduction, we will replace the executable code in the ncbi.r script (after the "URL shared by the ..." comment) with a window layout - a declarative description of what the GUI looks like and how it will behave. If you want a fuller explanation of the GUI facilities, again, please take a look at Nick Antonaccio's tutorials. You might even want to go there first.
Here's the GUI replacement code and an explanation:
eutilities-url: http://www.ncbi.nlm.nih.gov/entrez/eutils/
; The layout (GUI) for this NCBI dialog
ncbi-layout: layout [
across
text "Protein Name: "
protein-name: field
button "Search" [the-list/data: [] show the-list
the-list/data: esearch protein-name/text
the-fasta/text: " " show the-fasta
show the-list]
return
text "Choose an Id: "
the-list: text-list [id-selection: value]
button "Fetch" [the-fasta/text: efetch id-selection "fasta"
show the-fasta]
return
text "The Data: "
the-fasta: tt 400x200
return
button "Exit" [unview]
]
; Begin executable script code
; Display the dialog
view ncbi-layout
Explanation:
- As mentioned earlier, the form and function of a GUI layout is described first. Later, the layout will be displayed (viewed). The variable ncbi-layout is set to the value of the description.
- The layout is written using what is called a "dialect" in REBOL. A dialect is an extension to the REBOL language. It is usually designed for a special purpose: in our case for describing GUI layout. It has its own set of reserved words that have special meaning in the context of the dialect. The GUI dialect is announced with the word layout followed by a block containing the description. Note that a "hot" topic nowadays is DSLs (Domain-Specific Language). REBOL dialects are DSLs. One can create their own dialect as needed.
- One way to understand the layout is to read it from top to bottom. While reading, special words will be encountered. For example, the first word across says that all the following UI elements should be positioned from left to right, across the window. This is followed by a text command that says place the following text in the window. Next, a variable protein-name is declared and its value will be set to the value of a field. A field will create an input field in the window. A little further down is the word return. This is not a return from the layout; it is a command to return to the left of the window and begin placing the ui components there.
- Buttons are used to start some action. Pushing the "Search" button causes the block that follows to be executed. The action, in this case, is for the layout to use the value typed into the protein-name field as an argument to the esearch function. Recall that esearch returns a list of ids. This list is assigned to the variable the-list. If you look down a few lines, you'll see the-list actually contains a ui text list. That is, the initial value is displayable list. When the-list is set to the result of esearch, the list of ids will appear in the window as a scrollable and selectable list.
- To actually display the ui layout, the command view is used. This will begin the dialog and not return until the dialog is exited.
In the next post, I want to get back to developing more bioinformatic-related code.

1 comments:
great to see BioRebol back!
it had disappeared from RSS for a while and I feared the worst.
Post a Comment