Environment from the Molecular Level

A NERC eScience testbed project

The eMinerals minigrid

Introduction

The eMinerals project is one of the NERC escience testbed projects. It is primarily concerned with the challenge of using computer simulations performed on molecular length and time scales to address important environmental issues such as including the effects of radiation damage in high-level nuclear waste encapsulation materials, the adsorption of pollutants on surfaces, and weathering effects. The project consists of approximately 20 workers distributed over six geographic locations within the UK, with the computational resources available to this team being similarly distributed. The scientists in the project run a number of different simulation codes, which are based on being able to describe the interactions between atoms using either empirical model potential energy functions or a fuller quantum mechanical approach. Many of the simulations are based on Monte Carlo or molecular dynamics algorithms. All have high computational demands.

As a testbed project, one of the objectives of the eMinerals project is to create an enabling grid-based infrastructure appropriate for the science drivers. Our approach has been to build upon established standards such as Globus and Condor. One key feature has been to integrate compute and data middleware tools analogous to how compute and data operations are integrated at the operating system level. The approach has been to construct the eMinerals minigrid with close collaboration with the science users as a high priority, both to ensure that the minigrid best meets the need of the scientists and to help the users learn to use the new system – we consider the close interaction between the project grid developers and the scientists to have been particularly important in setting up the eMinerals minigrid. It should be appreciated that the use of a shared grid resource is a big change in how the scientists represented in the project would previously have carried out their work. Typically members of the molecular simulation community will work with a small set of individual compute resources, and will manage their data on these resources through the usual unix tools.

This picture is a clickable map. Click on the different components to find out more about them:

The eMinerals integrated minigrid

The architectural arrangement of the eMinerals minigrid, composed of the integrated compute and data resources outlined above, is depicted in Figure 1. The architecture for data management within the project is shown in Figure 2.

The primary advantage of this distributed architecture is that all data files within the project are immediately available to all compute resources. Users upload input data files to the SRB prior to starting a calculation, and these data are then available wherever they choose to run the job. Similarly, on job completion, output data files are automatically stored within a nominated SRB vault, making them accessible to the user via any of the SRB’s interfaces (InQ for Windows, MySRB for any web browser, or the SRB unix S-command line tools if installed locally). The SRB is also used to store executable images of applications. At the time of writing the project vaults house over 40 GB of data, made up of some 10,000 files. However, usage is rising steadily as team members become more confident with the technology.

After output files have been loaded into the SRB, they can be annotated using the Metadata Editor. This is a simple forms-based web application that enables details such as the purpose behind running study and performing a particular calculation, who was involved, when and where the data were generated, and where the data are stored in SRB to be entered. As a result, members of the eMinerals project can search for the study details and datasets using the Data Portal, another web application that provides uniform search capabilities and access to heterogeneous data resources (Drinkwater et al, 2003). Data files can also be downloaded through the Data Portal if desired.

Although the eMinerals minigrid is firmly rooted in the tools of Globus v2, with job submission handled through Globus, Condor and Condor-G toolkit commands and data accessed through the SRB, the architecture of the eMinerals minigrid retains the possibility to graft on a service-oriented work paradigm if this should prove useful for workflow issues. We are, for example, beginning to work with the Condor development team in order to integrate Condor with WSRF, using the eMinerals minigrid as our testbed.

The user: link to tools available to science users of the eMinerals minigridSRB servereMinerals Lake linux clustersIBM Load Leveller job managerSRB vaultPBS job managerThe Globus toolkitSRB vaultIBM Load Leveller job managerThe Globus toolkitUCL condor pooleMinerals Lake linux clustersSRB vaultUCL condor poolPBS job managerThe Globus toolkitSRB vaultThe Globus toolkitSun grid engineApple Xserve clusterCambridge Pond linux clusterSRB vaultThe Globus toolkitPBS job managerCambridge condor pooleMinerals Lake linux clustersSRB vaultThe Globus toolkitCambridge condor poolPBS job manager