GridBlast consists of the following files:
GridBlast.pl : This is the main script file. Its function is to do static
scheduling of the queries for the different nodes, tar-zip and stage the
database, executables and query files to the remote node and to spawn remote
jobs using globus-run. It is a multi-threaded version which implies that each
client node is serviced by a separate thread of execution.
head_node_script.pl: This is the remote node script file. It is spawned on the
remote nodes by using the globusrun command. It starts a GASS server on the
remote node and then connects to the GASS server on the local node. Once the
server-to-server connections are made, the remote script file initiates the
transfer of the necessary executable, database and query files and when
complete, sets up the executables and depending on whether the remote node is a
single processor node or a multi-processor node, spawns either repeated runs of
blastall, the BLAST executable or Scatter, the task farming application for
running high throughput BLAST on a cluster.
The server program for Scatter, in turn spawns the client program on the job
manager for the remote node. A work-queue scheduler is used to distribute the
queries to the various processors on each grid node. As each node completes its
quota of queries, the results are "tar-zipped" again and copied back to the
local node.
clientfile: This file contains information on the client nodes in the cluster
that are to be used for the BLAST run. The formatting is quite simple and
consists of a list of node names followed by the number of processors available
on that node and the local scheduler on that node, on each line. In case there
is no scheduler, "none" is entered.
for e.g.:
some.server.edu 8 pbs
someother.server.edu 4 sge
yet.another.server 1 none
scatter.job_orig: This file is used for spawning on a cluster using the PBS job
scheduler. Currently the application only supports PBS on clusters. However,
future versions will also support LSF and Condor (among others).
head_node_script.rsl_orig: This file is the RSL file for job submission using
globusrun.
Node_Specs: This file specifies the parameters necessary for the minmax
algorithm. The file is a simple data file consisting of the same number of rows
corresponding to the number of machines in the clientfile. The format for the
columns is as follows:
<# of procs on node> <Size of file transferred> <Comm time/MB>
<Exec time for one blast run>
The order of the nodes should be the same as in the clientfile.
|