Environment from the Molecular Level

A NERC eScience testbed project

eMinerals Frequently Asked Questions

This page is the place to store questions that have been answered in the Forum section and are of sufficient interest to record.

  1. Why do the lake machines regularly give me transient login errors using gsissh?
  2. How do I go from sitting at my computer to being logged in to a Lake cluster using gsissh?
  3. How do I set up my lake accounts to be able to perform server-side SRB manipulation?
  4. How do I export/import my certificate out of/into browsers and how do I extract the DN for access to the wiki?
  5. Where can you get an SVG plug-in from, as the map page is not viewable without it?
  6. How do I use the MPI libraries on the lakes?
  7. What parallel maths libraries are available on the lakes?
  8. How do I submit an MPI job to the PBS queues on the lakes via Globus?
  9. Can we have some information on setting up Globus in the HowTo section?
  10. What ports do I need to have opened in a local firewall in order to use the multicast bridge at eminerals.esc.cam.ac.uk for PIG sessions?
  11. How do I monitor the current state of the queues on the clusters?
  12. Why does using pbs_run_mpi on the lakes show some strange syntactic behaviour?
  13. Why has my Condor-G job been held, with the log saying that my user proxy has expired: Globus error 131: the user proxy expired (job is still running)
  14. How do I request an X509 certificate issued by the eMinerals CA?

1. Why do the lake machines regularly give me transient login errors using gsissh?

This is most likely due to slight differences in time between your client machine and the server machine. Gsi, the globus grid security infrastructure, is very time sensitive. This allows protection, for example, against users backdating their computer to make an expired proxy valid again.

All our resources are synchronized using the NTP (network time protocol) server: ntp2.ja.net Please configure your client machine to use this NTP server to ensure synchronization with the servers. (email helpdesk@eminerals.org for more information). Clovis

2. How do I go from sitting at my computer to being logged in to a Lake cluster using gsissh?

Here's something not everyone knows, ssh, scp and sftp and their gsi counterparts all use different syntax to target a port:

gsissh -p 2222 lake.esc.cam.ac.uk
gsiscp -P 2222 lake.esc.cam.ac.uk
(note difference in case)
gsisftp -oPort=2222 lake.esc.cam.ac.uk
(actually o passes any argument to ssh)

Depending on how you've set up you gsis* commands you may be able to type

ssh -p 2222 lake.esc.cam.ac.uk
scp -P 2222 lake.esc.cam.ac.uk
sftp -oPort=2222 lake.esc.cam.ac.uk

to access lake. The gsis* commands work as conventional ssh/scp/sftp clients too:

gsissh jwak02@hartree.hpcf.cam.ac.uk

works fine. (Jon)

3. How do I set up my lake accounts to be able to perform server-side SRB manipulation?

First use gsissh to get onto the target cluster (note that you'll have to do this for each cluster). Then follow these instructions:

mkdir ~/.srb
cd ~/.srb

Create the file .MdasEnv and add the following lines (do not include the <> brackets):

mdasCollectionHome '/home/<your SRB user name>.eminerals'
mdasDomainHome 'eminerals'
srbUser '<your SRB user name>'
srbHost 'eminerals.dl.ac.uk'
srbPort '5544'
defaultResource '<a valid vault,e.g. LakeUCLVault, CCLRCFS or CambsLake>'
AUTH_SCHEME 'ENCRYPT1'

Also create the file .MdasAuth and add the following line:

<your SRB password>

Then

chmod 600 .M*

(Mark)

4. How do I export/import my certificate out of/into browsers and how do I extract the DN for access to the wiki?

See this instructions page for full instructions.

5. Where can you get an SVG plug-in from, as the map page is not viewable without it?

The Adobe plugin for Windows and Apple computers can be obtained here.

Mozilla no longer uses the Adobde SVG plugin, instead you must build mozilla from source with SVG support. Download mozilla and do this:

./configure --enable-svg --enable-svg-renderer-libart --prefix=/usr/local/moz-svg
make
make test
make install

if you are using Linux. However, the web page containing the SVG will also need to be changed, because the plugin uses the <embed> tag while the Mozilla uses namespaces for distinuguishing SVG. (Jon)

6. How do I use the MPI libraries on the lakes?

It depends which libraries you want to use. Currently there are LAM and MPICH ones, each built with Gnu and Intel compilers. Hence, to use the MPICH ones add the following to your .bashrc:

export MPICH=/opt/mpich-intel or /opt/mpich-gnu
export PATH=$MPICH/bin:$PATH
export RSHCOMMAND=ssh

To use the LAM ones add instead

export LAMHOME=/opt/lam-intel or /opt/lam-gnu
export PATH=$LAMHOME/bin:$PATH
export LAMRSH="ssh -x"

(Mark 26/05/2004)

7. What parallel maths libraries are available on the lakes?

There are BLACS and SCALAPACK libs built with the Intel compilers for both the LAM and MPICH flavours. These are both built using the ATLAS versions of BLAS. Hence, find the relevant version of BLACS in /usr/local/BLACS/[lam or mpich]-intel. The SCALAPACK libs are in /usr/local/lib as libscalapack-[lam or mpich]-intel.a, together with the BLAS libs.

(Mark 26/05/2004)

8. How do I submit an MPI job to the PBS queues on the lakes via Globus?

First, make sure you understand how to submit a simple, single-node, non-MPI job (see the example I've put in the SRB at mcal00.eminerals/dag_srb_pbs). Now consider submitting a VASP job that requires four CPUs. Then the corresponding Condor submit script will look something like:

======================
Universe = globus
Globusscheduler = lake.esc.cam.ac.uk/jobmanager-pbs
Executable = vasp-lam-intel
Notification = NEVER
Environment = LAMRSH=ssh -x

GlobusRSL = (job_type=mpi)(count=4)(queue=workq)(mpi_type=lam-intel)(directory=/home /mcal00/Test)

transfer_files = ALWAYS
stream_output = false
stream_error = false

Output = job.out
Log = job.log
Error = job.error
Queue
======================

A few comments. The executable needs to be built for the a specific MPI flavour, so here we've built it for the LAM distribution using the Intel compilers. Next we have to pass an environment variable (setting it in your .bashrc on the lakes is not enough). This one is needed for using LAM libs; if you want to use MPICH instead then replace that with:

Environment = RSHCOMMAND=ssh

Next comes the GlobusRSL. I'm assuming here that we're using the SRB to extract input files into the working directory /home/mcal00/Test (if you use the example for a non-MPI, single node job mentioned above then this value will be set for you), so I'm not going to bother to transfer input/output files. Note that I've asked for four CPUs: "count=4". The other tag of interest in the RSL is the non-standard one mpi_type. This allows you to select which version of MPI you want to use; the allowed values on the lakes currently are lam-intel, lam-gnu, mpich-intel or mpich-gnu.

(Mark 31/05/2004)

9. Can we have some information on setting up Globus in the HowTo section?

At the moment you need to use the Globus Alliance's own documentation; unfortunately this now only has documentation of GT 3, the only supported version.

(Mark 18/05/04)

10. What ports do I need to have opened in a local firewall in order to use the multicast bridge at eminerals.esc.cam.ac.uk for PIG sessions?

Assuming you're using the eMinerals VV, then you'll need the following five holes for udp traffic (these are port numbers): 47000 and the range 50480-50483

(Mark and Rik, 19/05/04)

11. How do I monitor the current state of the queues on the clusters?

Go to a directory in my area on the SRB (mcal00.eminerals/bin) and get the tool "lakes". Then chmod +x it and after starting a proxy (e.g. on fried or silica) you can use it as:

lakes arg      where arg = bath, cam, rdg, ucl or all

Note that the bath option won't work for now until that machine is fully configured, but the others will. Further note that if you're using this tool on silica then change the first line in the script from #!/bin/sh to #!/sbin/sh due to silly IRIX convention.

(Mark 24/06/2004)

12. Why does using pbs_run_mpi on the lakes show some strange syntactic behaviour?

1) I found that the name of the job cannot start with a number, it has to start with a letter, e.g. "2nodes", does NOT work, "nodes2" WORKS.

2) It looks like when a job is running on the lakes it copies the files the program needs to a temporary directory, and after the job finished the files are copied back to the 'initial' directory. In this process it looks like hidden characters are added to the files (e.g. ^M) (as it does with windows/PC; it is a PC-cluster we are running on.) For example, I need a wave-function file to run my quantum monte-carlo calcualtions. I can use this wavefunction file oncw, and the program runs without problem. When I use this same wavefunction file to restart my job, it complains that the program cannot read the wavefunction file, or some other error. If I recreate (or ftp the same file from my desktop to the lakes) the program runs again without problems.

13. Why has my Condor-G job been held, with the log saying that my user proxy has expired: Globus error 131: the user proxy expired (job is still running)

It is often hard to estimate the duration of jobs, especially since they might be queued on the remote resource for a long period of time. However it is often best when submitting long-running jobs to generate a proxy that will be valid for a longer period of time as follows:

%> grid-proxy-init -hours 72

Or another number of hours that will cover the duration of the job. If the proxy still expires before the end of the job, this will not stop it. Simply re-generating a proxy (grid-proxy-init) will allow you to retrieve the results.

Clovis

14. How do I request an X509 certificate issued by the eMinerals CA?

First a warning: these are only recognised by our project. Don't expect the NGS or HPCx to honour them. Saying that, to get one first ssh to silica.esc.cam.ac.uk (if you haven't got a silica account then get in touch). Next run the command grid-cert-request. This will produce a file usercert_request.pem which you should email to me. It will also produce a userkey.pem file which is the corresponding private key. Keep this safe and don't email it. I will sign your request and send you back the correct usercert.pem to use for our CA.

Mark (03/06/2004)

Page maintained by Martin Dove
Last update 24/06/04


Close window button