my_condor_submit RCommands eMinerals site map

Environment from the Molecular Level

A NERC eScience testbed project

Delegated job management using GridSAM and BPEL

Overview

The eMinerals minigrid encapsulates a wide range of resources and services deployed across six sites in the United Kingdom. Scientists require means of exploiting these services in a fully integrated manner through the definition of computational processes specifying the sequence of tasks and services they require. The deployment of such processes, however, can prove difficult in networked environments, due to the presence of firewalls, software requirements and platform incompatibility. We show here a system that builds on a delegation model by which a scientist may rely on middle-tier services to orchestrate subsets of the processes on their behalf. For this purpose, we exploit the capabilities of the Business Process Execution Language (BPEL) standard, and other Web Service tools and standards, such as GridSAM and the Job Submission Description Language (JSDL). Based on the requirements of the eMinerals scientists, we define a set of interrelated workflows that correspond to basic patterns observed on the eMinerals minigrid. These will enable scientists to incorporate job submission and monitoring, data storage and transfer management, and automated metadata harvesting in a single unified process, which they may control from their desktops.

The increased adoption of Web Services by the Grid community makes the use of the BPEL standard in a Grid environment very appealing: BPEL is an orchestration language, enabling us to build services whose role is to coordinate interactions between Web Services according to specified workflows. Our system defines a a number of hierarchically organized workflows that correspond to basic patterns observed in our minigrid.

We focus on 3 interrelated aspects of a scientific process illustrated below:

  1. Job specification, scheduling, execution and monitoring across clusters
  2. Storage and retrieval of data from data storage vaults
  3. Automated metadata harvesting

Scientists can control the execution of these patterns using the Simple Grid Access (SGA) tool: a lightweight, self-contained tool, that enables them to launch job executions and manage submissions from their desktop, including transferring files to and from SRB storage vaults and uploading proxy certificates to a trusted myproxy server. In the near future, SGA should be capable of harvesting metadata from files produced in order to facilitate their indexation and storage.

The resulting architecture of our infrastructure is illustrated below:

Client tool manual

Basic arguments:
All arguments can be specified either as part of the command line call, or in a file called gssclient.properties located in your ~/.globus directory:

-v -d (verbose + debug)
-s target service (see gssclient.properties)

Security proxy settings:
(for interaction with globus 2.4 or above):

-myproxyusername (your username - local)
-myproxypass (your certificate password)
-myproxydn (this is the DN of the myproxy server - its in the gssclient.properties)
-myproxyserver (lake address)

You will require access to the port 7512 for the myproxy server on lake. The idea here is that we want communication between myproxy server and client to be direct, as it acts as a secure trusted party. Parties that then want to retrieve your certificate will have to authorize themselves with the server. You will also need access to 18080 for the main service (both outbound).

SRB settings:
(Note that additional transfer tools are supported, such as HTML, FTP and gridFTP – though argument set is not currently available).

-srbuser (your srb username)
-srbpass (your srb password)
-srbserver (opt. Mark's script location - defaults to the current one)
-srbpath (the directory you wish to use)

Flow control:
The following arguments allow you to determine whether or not input files should be taken from your desktop and output files returned to your desktop upon job completion. In addition, you can specify whether or not to block process until completion or simply return upon submission (for workflow management).

-srbtransfer (boolean - add this if you wish to get the files to transfer from your desktop to the SRB server before and after job execution).
-oneway (boolean - just quit after launching the job).

Job details:
-e executable
-i stdin
-o stdout
-transferin file to transfer alongside the job (can specify multiple transferins)
-transferout files to retrieve upon job completion (same)

Process details:
From a client perspective, the process that takes place is the following:

Monitoring job progress:
The bpel engine's progress can be monitored at: http://lake.geol.ucl.ac.uk:18080/BpelAdmin/active_processes.jsp
The processes are hierarchical which is why you will see more than one being spawned for each job. One sub-process will take care of getting your files in and out of the SRB, the other will be responsible for running the job. A third one will be responsible for interacting with the R-commands.

For a general idea of where the process is, click on MainWSSubmission. To check the job execution status, click on GSSubmission > new window >variables>GetJobStatusResponseVar. When the job completes, files are uploaded to the SRB. On the client side - if srbtransfer was selected - files are downloaded back onto your client.

General references

Papers that describe the eMinerals work on GridSAM can be downloaded as pdf files from the following links:

"Managing Scientific Processes on the eMinerals minigrid using BPEL". C Chapman, AM Walker, Mark Calleja, RP Bruin, MT Dove and W Emmerich.


Last edit 11/7/06
Copyright and contact information here