User Tools

Site Tools


hpc:argobalsam

ARGO (A Rapid Generator Omnibus) & Balsam

<html> <b>Authors:</b><br> J Taylor Childers (ANL HEP)<br> Tom Uram (ANL ALCF)<br> </html>

Description & Versions

<html> <p>Balsam is an interface to a batch system's local scheduler. The each scheduler is abstracted such that Balsam remains scheduler independent.</p>

<p>ARGO is a workflow manager. ARGO can submit jobs to any system on which Balsam is running.</p>

<p>Both ARGO and Balsam are implemented in <a href=“http://www.python.org/” target=“_blank”>Python</a> as <a href=“http://www.djangoproject.com/” target=“_blank”>django</a> apps. This was done because <a href=“http://www.djangoproject.com/” target=“_blank”>django</a> provides some services by default that are needed such as database handling and web interfaces for monitoring job statuses. <a href=“http://www.djangoproject.com/” target=“_blank”>django</a> version 1.6 has been used and <a href=“http://www.python.org/” target=“_blank”>Python</a> 2.6.6 (with GCC 4.4.7 20120313 (Red Hat 4.4.7-3)).</p>

<p>The communication layer is handled using <a href=“http://www.rabbitmq.com/” target=“_blank”>RabbitMQ</a> and <a href=“http://github.com/pika/pika” target=“_blank”>pika</a>. <a href=“http://www.rabbitmq.com/” target=“_blank”>RabbitMQ</a> is a message queue system. Message queues were an easy alternative to writing a custom TCP/IP interface. This requires installing and running a <a href=“http://www.rabbitmq.com/” target=“_blank”>RabbitMQ</a> server. <a href=“http://www.rabbitmq.com/” target=“_blank”>RabbitMQ</a> 3.3.1 is used with Erlang R16B02.</p>

<p>The data transport is handled using <a href=“http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/” target=“_blank”>GridFTP</a>. This requires installing and running a <a href=“http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/” target=“_blank”>GridFTP</a> server. Globus version 5.2.0 is used.</p> </html>

 Diagram of the ARGO - Balsam interactions.

Job Submission

<html>Jobs are submitted to ARGO via a <a href=“http://www.rabbitmq.com/” target=“_blank”>RabbitMQ</a> message queue. The messages use the python <a href=“http://docs.python.org/2/library/json.html” target=“_blank”>json</a> serialization format. An example submission is: </html>

example_msg.txt
'''
{
   "preprocess": null,
   "preprocess_args": null,
   "postprocess": null,
   "postprocess_args": null,
   "input_url":"gsiftp://www.gridftpserver.com/path/to/input/files",
   "output_url":"gsiftp://www.gridftpserver.com/path/to/output/files",
   "username": "bob",
   "email_address": "[email protected]",
   "jobs":[
      {
       "executable": "zjetgen90_mpi",
       "executable_args": "alpout.input.0",
       "input_files": ["alpout.input.0","cteq6l1.tbl"],
       "nodes": 1,
       "num_evts": -1,
       "output_files": ["alpout.grid1","alpout.grid2"],
       "postprocess": null,
       "postprocess_args": null,
       "preprocess": null,
       "preprocess_args": null,
       "processes_per_node": 1,
       "scheduler_args": null,
       "wall_minutes": 60,
       "target_site": "argo_cluster"
       },
 
      {
       "executable": "alpgenCombo.sh",
       "executable_args": "zjetgen90_mpi alpout.input.1 alpout.input.2 32",
       "input_files": ["alpout.input.1","alpout.input.2","cteq6l1.tbl","alpout.grid1","alpout.grid2"],
       "nodes": 2,
       "num_evts": -1,
       "output_files": ["alpout.unw","alpout_unw.par","directoryList_before.txt","directoryList_after.txt","alpgen_postsubmit.err","alpgen_postsubmit.out"],
       "postprocess": "alpgen_postsubmit.sh",
       "postprocess_args": "alpout",
       "preprocess": "alpgen_presubmit.sh",
       "preprocess_args": null,
       "processes_per_node": 32,
       "scheduler_args": "--mode=script",
       "wall_minutes": 60,
       "target_site": "vesta"
      }
   ]
}
'''

Installation

  1. On Mira: soft add +python (to get python 2.7)
  2. Install <html><a href=“http://pypi.python.org/pypi/virtualenv” target=“_blank”>virtualenv</a></html>
  3. Create install directory: mkdir /path/to/installation/argobalsam
  4. export INST_PATH=/path/to/installation/argobalsam
  5. cd $INST_PATH
  6. Create virtual environment: virtualenv argobalsam_env
    1. On Edison:
      1. module load virtualenv
      2. module load python/2.7
  7. Activate virtual environment: . argobalsam_env/bin/activate
    1. On Edison:
      1. To use pip you need a certificate so run mk_pip_cabundle.sh, then include --cert ~/.pip/cabundle in all your pip commands.
  8. Install needed software:
    1. pip install django
      1. if you have less than python-2.7 you need pip install django==1.6.2
    2. pip install south
    3. pip install pika
    4. pip install MySql (only if needed)
      1. For this I had to install on SLC6 yum install mysql mysql-devel mysql-server
  9. Create django project: django-admin.py startproject argobalsam
  10. cd argobalsam
  11. git clone [email protected]:balsam.git argobalsam_git
  12. mv argobalsam_git/* ./
  13. rm -rf argobalsam_git/
  14. Update the following lines of argobalsam/settings.py:
    1. At the Top:
      from site_settings.mira_settings import *
    2. INSTALLED_APPS = (
          'django.contrib.admin',
          'django.contrib.auth',
          'django.contrib.contenttypes',
          'django.contrib.sessions',
          'django.contrib.messages',
          'django.contrib.staticfiles',
          'south',
          'balsam_core',
          'argo_core',
      )
    3. If you are using MySQL:
      DATABASES = {
          'default': {
              'ENGINE': 'django.db.backends.mysql',
              'NAME': 'your_table_name_goes_here',
              'USER': 'your_login',
              'PASSWORD': 'your_password',
              'HOST': '127.0.0.1',
              'PORT': '',
              'CONN_MAX_AGE': 2000000,
          }
      }
  15. Setup your database:
    1. For Older Django using South (pre-1.7):
      1.  python manage.py syncdb 
      2. Create the first migration
         python manage.py schemamigration balsam_core --initial 
      3. Apply the first migration
         python manage.py migrate balsam_core --fake 
      4. Create the first migration
         python manage.py schemamigration argo_core --initial 
      5. Apply the first migration
         python manage.py migrate argo_core --fake 
    2. For newer Django (1.7,1.8):
      1.  python manage.py syncdb 
    3. For Django (1.9+):
      1.  python manage.py migrate --fake-initial 

Git Tag Notes

<html><a href=“https://trac.alcf.anl.gov/projects/balsam/browser” target=“_blank”>Git Browser</a></html>

5.0

  • updated to MySQL database
  • create filter website
  • Added group_identifier to ArgoDbEntry so that jobs can be grouped together and searched for easier.

4.1

  • updated vesta balsam delays to 5 min
  • making all settings files consistent.
  • added code to deal with jobs after a crash, and added print statements
  • added restart try of subprocesses in the service loop
  • updated vesta settings
  • Updated Balsam code to support tukey submit script with manual mpirun call. Mira handles the mpirun call for the user whereas tukey does not.
  • added tukey balsam site to argo settings
  • updated argo settings to reflect new install directory balsam_production

4.0

  • adding condor command files which should have already been added. Fixed bugs with adding alpgen data to finished emails.
  • adding logger.exception in the proper places. Fixed small bugs and added JobHold error
  • Adding files for the condor scheduler changes to include dagman jobs and condor job files.
  • Many updates to accomodate a Condor Scheduler ability to accept condor job files and condor dagman job files. Also includes updates to include more information on emails upon completion.
  • fixing missing str() conversion
  • fixed error messages where integers needed to be converted to strings
  • updating only specific db fields instead of full object

3.2

  • re-added JobStatusReceiver

3.1

  • adding QueueMessage class to argo
  • merging old and new?
  • adding longer wait times to balsam on ascinode settings, and an error checking to the job status receiver.

3.0

  • major change to ARGO so that each transition takes place in a subprocess and does not block argo from processing other jobs in parallel
  • added error catching for job id mismatch
hpc/argobalsam.txt · Last modified: 2016/02/24 21:31 by jchilders