![]() |
Environment from the Molecular Level A NERC eScience testbed project |
User interface to the eMinerals minigrid
Use of Globus
The front end to the facilities of the eMinerals minigrid is based around the Globus toolkit. Currently the minigrid has a mixture of 2.x and 3.2 releases (see Appendix A.2 for a description of the different versions), though we are in the process of upgrading all gatekeepers to GT3.2. There is one gatekeeper for each cluster, and all minigrid resources are accessed via one of these gatekeepers. Hence, the PBS queues on each cluster are accessed by requesting the corresponding jobmanager on that cluster in a Globus or Condor-G command. Similarly, the Condor pools at UCL and Cambridge are reached by requesting the correct Condor jobmanager from the gatekeeper, e.g. to request a Linux machine with an Intel architecture in a Condor pool one would nominate jobmanager-condorINTEL-LINUX.
In order to facilitate the porting and building of code, one of the Lake clusters allows gsissh access and accepts jobs to its PBS queue by direct command-line submission. However, production runs can only be submitted to the rest of the minigrid only through Globus.
Because access to the eMinerals minigrid is via Globus tools, users need to have access to the Globus client tools. Installing the Globus and Condor-G client tools on every user’s desktop machine has not proved to be easy (they will not work on Windows machines for example, or with machines whose IP addresses are assigned dynamically), and because of this we have provided a small number of dedicated machines to be used as job submission nodes within the minigrid. Indeed, only a small number of users have a full suite of client tools on their desktops, the reasons for which are mainly two-fold: a) installing these tools is not a trivial affair, and b) such tools require major configuration changes in local firewalls.
Although the architecture of the eMinerals minigrid represents a successful minigrid implementation, it does require that any firewalls present be suitably configured to allow the relevant traffic to pass. Such traffic occurs on well-defined port ranges, but it has been necessary to work closely with institution computer support staff in order to investigate and solve a number of associated problems. One way to mitigate against such problems is to have all traffic propagate over a single, well defined, port such as port 80 for HTTP. The SRB web interface (MySRB) and the DataPortal take this approach, and we are developing a compute portal to assist users submit jobs to the minigrid and monitor their progress.
The architecture of our minigrid enables eMinerals grid developers and administrators to directly assist users with the usage of Grid resources. Indeed, a ticket-driven helpdesk system based on the OTRS software has been set up in order to systemise troubleshooting such problems. In effect, the deployment of a number of submission nodes, which act as gateways to these resources, allows administrators to configure, test and manage grid tools on behalf of users, limiting their actual need to deal with the complexities of installation (although some users have chosen to also install Globus and Condor-G client tools on their desktop machines). The user can then submit jobs either via these pre-configured nodes or from their own desktop PCs.
Job submission
To enable users to submit jobs to a grid environment using Globus in a way that they find simple and intuitive has required a separate development effort. The raw Globus command-line tools have not proved to be sufficiently user-friendly for our purposes, and the use of bespoke scripts that require users to add modifications is also not satisfactory. The approach we have taken is to develop general-purpose scripts based on the use of two Condor tools, namely Condor’s Globus client tool, Condor-G, to submit jobs to the minigrid resources , and the Condor workflow tool DAGMan (Directed Acyclic Graph manager).
Submission of a standard job to the eMinerals minigrid involves a three-stage workflow implemented using Condor’s DAGMan tool:
- The job first creates a temporary working directory on the gatekeeper and extracts any relevant job input data files from the SRB.
- The main job executes on one of the compute resources.
- Finally, all nominated output files are put into the SRB for the user to view from his/her desktop.
These steps represent different nodes in the workflow, which are automatically generated for the user by using our own variant of Condor’s condor_submit command, called my_condor_submit, which includes extensions to the Condor submit file syntax to allow SRB-specific extensions .
All these steps make use of the fork jobmanager, except for the actual job execution stage, which makes use of the jobmanager for the relevant resource (e.g. PBS, Condor, etc.). Hence, the user only ever issues one command, without having to worry about the details of the underlying workflow. It is this wrapper’s job to autogenerate the various scripts required to perform the workflow. The main point here is that all data handling is done on the server side (and the execute machine), with that data being available to the user from any platform that supports one of the SRB’s many client tools, such as the MySRB web browser interface.
This approach maps easily onto the data lifecycle paradigm. In addition to developing the script submission method, we are in the process of developing a web-based compute portal, which will provide a browser interface for accessing all of the current functionality, as well as introducing some new services (e.g. job monitoring, resource discovery, accounting, etc.). Although at the time of writing (October 2004) this work is currently in progress, the aim is to provide a fully integrated workspace, capturing not just the functionality mentioned above but also other collaborative tools being developed within the project.
Job submit scripts
The submission of a job to the eMinerals minigrid requires the use of a script developed by one of the authors of this paper (MC), called my_condor_submit. This script handles the running of the job and the transfer of data between the SRB and the compute resources. It is available as a download from www.eminerals.org. The user requirements are met through a simple file whose name is given as the argument to the execution of the script. The file has the form:
Universe = globus
Globusscheduler = <minigrid resource>/jobmanager-<jobmanager>
Executable = <name of executable binary or script>
Notification = NEVER
# Next line is example RSL for a single-processor PBS job
# Modifications are required for other job managers
GlobusRSL = (arguments=none)(job_type=single)(stdin=<filename>)
Sdir = <some directory in the SRB>
Sget = <list of input file names, or * for wildcard>
Sput = <list of output file names, or * for wildcard>
Output = <standard output file name>
transfer_output = False
Log = <name of log file>
Error = <name of standard error file>
Queue
The values of parameter given in <angle brackets> can be altered by the user. The Sdir directory is a directory in the user’s SRB space. The Sget parameter is a list of input files in the SRB that need to be fetched at the start of the job. The Sput parameter is a list of output files that are to be put into the SRB after the execution of a job. These two parameters can be * for a wildcard list, which is particularly useful when the exact list of output files is not known in advance. The Executable, stdin, Output, Log and Error parameters are the names of files that are held or created on the computer from which the job has been submitted. The executable file, for example, will be transferred to the minigrid resource as part of the job submission process. This can be a binary or a script; the latter would be used if the executable binary file will be obtained from the SRB. The minigrid resource will be one of the compute resources within the minigrid, and would be assigned the name of the computer, e.g. lake.geol.ucl.ac.uk. The jobmanager parameter would be PBS for one of our linux clusters, or condor-INTEL-linux for a Linux computer within a Condor pool.
Data management within the minigrid environment
Since users are not able to directly log in to the grid clusters, what users can do with their output files is restricted. If users know the files being produced, they can use the gridftp tool provided with the Globus toolkit, but there are caseswhere users do not know exact details of files being produced. Overall, we decided that most users needs would be best met by employing a distributed data management infrastructure, and the best product for this is the SRB. The SRB provides a single logical file structure even though data are distributed over several locations, and the user sees a single point of access to this file structure. The geographical location of any file is reduced to a mere file attribute. The SRB has a central metadata catalogue (MCAT) server that maintains information about all files within the SRB.
In the case of the eMinerals project, the MCAT server is located at the Daresbury Laboratory. In addition to the central MCAT server, the SRB system requires a set of storage vaults. One vault has been setup on each of the three clusters (the 720~GB raid arrays referred to above). Moreover the SRB client tools are installed on each cluster.
The operational approach is that users manage their data on the SRB. They put their files onto the SRB before beginning their calculations. Their jobs download the relevant files from the SRB prior to running the main application code, and place all output files into the SRB at the end of the job. It is possible to use wildcard file specifications, which means that users do not need to know details of files produced within a run in advance.
One advantage of this way of working is that the job lifecycle process results in an archive of the entire process being maintained on the SRB. This is particularly useful for collaborative workers.
Useful links
http://www.mandrakesoft.com
http://oscar.sourceforge.net
http://www.openpbs.org
http://www.clusterresources.com/products/maui
General references
Papers that describe the eMinerals science areas are:
|
|
|
|
|
|
![]() |
|

