Environment from the Molecular Level
A NERC eScience testbed project
History of the eMinerals project
A personal view, by Martin Dove
In the beginning ...
The idea of eMinerals was born when escience was just an idea within the UK science community.
My involvement in escience began in early 2000. My head of department, Ekhard Salje, told me that he had been attending meetings involving the Director General of the Research Councils about the computing infrastructure of the future. At that time, many of us were wedded to the use of high-performance computing facilities for our simulations, augmented by workstations from companies such as Silicon Graphics. eScience was to present a vision that would take a little while to get used to!
In the first few months of 2000 I attended a number of ad hoc and formal meetings associated with research council communities. The NERC community tended to work within small group meetings, involving (among others) representations of the atmosphere, oceans and mineral sciences communities. These NERC meetings culminated in a paper I edited on behalf of Ekhard, which was submitted to the Research Councils.
The Research Councils then formed what was called the Informatics Committee under the chair of Ian Halliday (then Chief Executive of PPARC). Ekhard was one of the NERC representatives on this committee, but on several occasions I deputised for him. The result of this committee was a proposal to the Office of Science and Technology that there should be a cross-council programme called eScience, and an approximate budget (with allocations) was drawn up. This successfully found its way into the Chancellor's Spending Review 2000 as the start of a five-year programme.
The origins of eMinerals
NERC was relatively late in starting its escience spending programme. NERC formed a one-meeting committee to shape some ideas and to propose a framework for a forthcoming community town meeting. That meeting was held on 31 March 2001, and the report is available here. The result of this meeting was a call for project proposals.
The meeting was attended by a number of the NERC computational mineral sciences community, and following the meeting we started to organise the eMinerals project proposal (although we didn't use this name at first). The first stage was a short pre-proposal, with the view that successful applicants would be invited to submit full proposals. From the small group of initial scientists, we developed the team of simulation code developers and grid specialists that became the eMinerals project team. Our pre-proposal (pdf version here) made it into the full proposal stage. We submitted our full proposal for 15 June 2001. As part of the proposal evaluation we gave a short presentation and answered questions at a meeting of NERC's escience steering committee in December 2001, and were fortunate to be one of the funded projects along with GODIVA and the ClimatePrediction.net projects.
We started the recruitment process in Spring 2002, and in this process we appointed our first 5 research team members. The actual grant kicked off on 1 August 2002, with the original group of team members beginning at varioius times soon after and the remainder of the positions (12 full-time equivalent research staff) being gradually filled over the following months.
The early months of eMinerals 1 were spent running a number of grid experiments. We looked at a number of the core middleware tools, such as globus and condor, and other approaches such as the use of portals. One of our early objectives was to start to develop the UCL campus grid, a Condor pool of around 1000 teaching PC's. We ran a two-track approach, with small initial pools being built in both UCL and Cambridge to gain expertise. As a result, the build-up of the UCL campus grid ran relatively smoothly, and became the largest academic Condor pool in the UK.
The scientists started to learn to use the grid job submission tools that we were working with. At an early stage it was instructive to work out what sort of tool our team scientists were happy with, and what were unreasonably hard. Thus began our concern that escience tools needed to be as easy for our scientists to use as other computing infrastructures, and any overhead associated with being a new way of doing things should be as non-intrusive as possible. Subsequently it has seemed to us that the concept of "usability" in the wider world of escience has become a concern rather late in the day, but from quite early on we were taking careful account of user experiences.
During our first year our CCLRC partners started to develop an SRB service for data management. There was some discussion of how we should use the SRB and the CCLRC data portal for data management. There were two aspects to this discussion: how we should use the tools for long-term management of our data, and how we should handle files in the period of time starting from the instance of their creation. We realised that we were not satisfied with grid file transfer tools. For example, some of our simulation programs will generate files that cannot easily be predicted from the outset, and to use tools such as GridFTP to transfer files was not easy to implement. Thus we made an early decision to integrate the SRB closely into our compute grid infrastructure to create the basis of the integrated compute/data eMinerals minigrid.
During our the spring of our second year (2004) we started to pull together the components that became our compute minigrid. We had identified a set of available resources from the outset, and augmented them with three new linux clusters which we called the "Lakes" (Cambridge, UCL and Bath) on which we installed a PBS job manager, a Globus gatekeeper and an SRB vault.
The next challenge was to make the eMinerals minigrid easy to use. As a quick fix we developed the first version of our grid submission tool, my_condor_submit (MCS). This used a simple workflow, enabled by Condor's DAGman, to run three consecutive jobs to download files from the SRB, run the simulation code, and then place the generated files into the same collection on the SRB. Having discovered that Condor is much more usable than Globus, we made use of Condor-G - Condor's wrapping of Globus job submission commands - and developed MCS to take a Condor-like job input file. Our work on MCS was carried out in Spring of 2004, and was reported at the eScience All Hands meeting later that year.
We also ran a number of experiments with collaborative tools. We had always planned to use collaborative tools, but at the 2003 All Hands meeting we changed our plans on how to proceed. At that meeting we ran our first desktop Access Grid session (free from the constraints of firewalls). We were always concerned to augment desktop videoconferencing with application sharing tools, and we laid down the strategy that resulted in our MAST tool at that meeting.
The eMinerals project has always tried to be flexible to accommodate new ideas that have come our way, rather than tie ourselves to a rigid plan. The incorporation of the SRB is one example of this. Another significant example is the use of XML, specifically the Chemical Markup Language (CML). Simulation scientists are traditionally happy to work with whatever format of output file the code developers force on them, and as simulation scientists we had not thought too closely about data representation. In the Autumn of 2002 we met Peter Murray-Rust from Cambridge Chemistry, and we started a collaboration - slowly at first - in which we developed some new Fortran tools to enable us to make our main codes write CML output.
The collection of papers presented at the 2004 UK eScience All Hands meeting is an interesting collection in that is very well focussed on our achievements in the first two years of the eMinerals project. This collection was carefully planned, and I think it shows how far we had come in the first two years; it is a collection that I am very proud of.
Once we had developed the eMinerals minigrid and other tools, our focus moved towards using our tools to enable the science. Initially we had anticipated that grid tools would enable us to run larger-scale simulations, and in fact our work on DL_POLY_3 has done just that, but it became apparent during our third year that the great strength of grid computing is to facilitate what we called "combinatorial studies". These are studies in which many calculations are performed with slightly different parameters, such as a sweep across a range of pressure or temperature, or to run calculations on all permutations of chemical makeup of the dioxin molecule, C12O2ClxH10-x. The key point is that the sort of commodity computers that one now are the main components of compute grids are as powerful as (or more powerful than) the processors found in high-performance computers. Moreover, memory is cheap, and grids of commodity computers can give computational power that is as significant in terms of GFlops (or even TFlops) as high-performance facilities. The only difference is that one cannot easily run parallel computations within a grid environment, but that is offset by the fact that one can run the many calculations required by a combinatorial study in parallel. The eMinerals project can run simulations on a standard PC with reasonable memory that would have required supercomputer resources only a few years ago.
Developing the case for eMinerals 2
The point about the value of compute grids for combinatorial studies is significant in the light of our thoughts for a follow-on proposal to NERC.
Our proposal was submitted on 28 September 2004. It was more of a last-minute dash than I recall facing before. I was still writing the proposal on that morning, before getting in a car with Matt and Richard and driving to Swindon. On the way we had to transfer some files to a laptop via mobile phone, and then get a friend at the Rutherford Appleton Laboratory to print out before we arrived at Swindon. Very escience in action!
The focus of our proposal for eMinerals 2 was to exploit the strengths that come from having the eMinerals team operate as a virtual organisation. Our science focus is on the adsorption of pollutants on mineral surfaces, and across the project we have expertise in all the key methods, a wide range of minerals surfaces, and a wide range of pollutant adsorbants (cations, anions and molecules). Only with escience tools is it at all realistic to mount such a wide-ranging challenge.
The outcome of our proposal was delayed by some time owing to financial problems within NERC, and as a result we were unable to make appointments to the staff vacancies that inevitably arise as a matter of routine in a project as large as eMinerals. Thus we carried a number of vacant staff posts thoughout 2005 which necessarily meant that there was a bit of a lull in our work. However, we have started 2006 with a good strong team, and our collection of papers accepted for the 2006 UK eScience All Hands meeting represents good progress towards meeting the aims of eMinerals 2.
Last edit 11/7/06
Copyright and contact information here