Table of Contents
Introduction
ATLAS Tier3g environment is set up so that most analysis instructions available on general ATLAS Twiki's will work. This documentation will orient you in some specific features of a standard Tier3g(T3g) and point to other relevant documentation. The model Tier3g at ANL ASC is described here; where your own T3g will likely differ from ANL ASC, is noted.
ATLAS Tier3g consists of the following elements
- Interactive nodes: this is where the user's log in and do all of the work, including submission to the Grid and to your local batch cluster. At ANL ASC, they are called
ascint0y.hep.anl.gov ancint1y.hep.anl.gov
- At ANL, they are currently only accessible from inside the ANL firewall.
- Worker nodes: this is where your batch jobs will run; normally there is also associated storage of data. There is usually no necessity for users to know the details about the worker nodes. At ANL ASC, there are a total of 42 batch slots available for use.
- Data gateway: There is a way for your T3g to get large amounts (multi !TeraBytes) of data from the Grid and load it into your batch cluster for you to run over. This is normally controlled by the ATLAS Administrators at your site. You can copy small data sets on your own to the interactive nodes for test processing.
Setting Up Your Account
Basics
Your T3g account will have bash shell as default. It's recommended that you stick with this. Given the limited manpower, we did not install the rebuilt special C-shell needed for ATLAS software–nor test any of the functionality from C-type shells.
Your home login area will normally be /export/home/your_user_name. In case of ANL ASC, it is /users/your_user_name. This is because the home area at ANL ASC is shared with another cluster. You may find a similar arrangement at your T3g.
Before you get to work, you will probably want to do the following for convenience. In .bash_profile, put in
# Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi
And in .bashrc, put in.
# add ~/bin/ to path PATH=$PATH:$HOME/bin:./ # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi
To get the default bashrc to give you a prompt that shows user and node, add the current directory, and your own stuff to your PATH; you are also ready to put in aliases and functions in .bashrc.
Your ATLAS environment
The ATLAS envrionment in a T3g is based on the https://twiki.atlas-canada.ca/bin/view/AtlasCanada/ATLASLocalRootBase][ATLAS Local Root Base Package developed at ATLAS Canada. The original documentation resides on Canadian ATLAS pages; they will move to CERN (as will these pages) in the future and be maintained centrally.
The other part of your environment comes from the file system CVMFS which is a web file system (and part of the http://cernvm.cern.ch/cernvm/][CERNVM project–although what is being used here does not have to do with Virtual Machines) that maintains the Athena versions as well as conditions data centrally at CERN.
The two environments are designed to work together.
To start your envrionment, you need to do the following:
export ATLAS_LOCAL_ROOT_BASE=/export/share/atlas/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
You can put this into .bashrc or, if you prefer, make a separate shell script that you can execute.
Now you can do:
setupATLAS
You should see the following output on your screen
...Type localSetupDQ2Client to use DQ2 Client ...Type localSetupGanga to use Ganga ...Type localSetupGcc to use alternate gcc ...Type localSetupGLite to use GLite ...Type localSetupPacman to use Pacman ...Type localSetupPandaClient to use Panda Client ...Type localSetupROOT to setup (standalone) ROOT ...Type localSetupWlcgClientLite to use wlcg-client-lite ...Type saveSnapshot [--help] to save your settings ...Type showVersions to show versions of installed software ...Type createRequirements [--help] to create requirements/setup files ...Type changeASetup [--help] to change asetup configuration ...Type setupDBRelease to use an alternate DBRelease ...Type diagnostics for diagnostic tools
Getting ready to run Athena interactively
Running on CVMFS athena versions
This is the generally recommended way to run Athena at a Tier3. The athena versions which is suitable at scientific linux 5 (sl5) installation such as ANL ASC is at
/opt/atlas/software/i686_slc5_gcc43_opt/
Note: /opt/atlas/ area is remotely mounted and cached locally: this means you shouldn't do a recursive command (like ls -R) on these directories or you could be waiting for a very long time.
You can do a simple “ls”, for example to find installed versions on CVMFS:
[test_user@ascwrk2 ~]$ ls /opt/atlas/software/i686_slc5_gcc43_opt/ 15.6.3 15.6.4 15.6.5 15.6.6 gcc432_i686_slc4 gcc432_i686_slc5 gcc432_x86_64_slc5
You can look for patched versions in the following way:
[ryoshida@ascint1y ~]$ ls /opt/atlas/software/i686_slc5_gcc43_opt/15.6.6/AtlasProduction/ 15.6.6 15.6.6.1 15.6.6.2 15.6.6.3 15.6.6.4
You can set up your testarea as usual (example here sets up 16.0.0)
mkdir ~/testarea mkdir ~/testarea/16.0.0 export ATLAS_TEST_AREA=~/testarea/16.0.0
Now you need to set up the correct version of the C++ compiler for Athena and your setup (at ANL ASC it is 64-bit slc5) using the environment created by the ATLASLocalRootBase package. (This version of gcc will become the default in the future).
localSetupGcc --gccVersion=gcc432_x86_64_slc5
Now you need to setup the version you want. (An alternate setup procedure using cmthome directory is HowToCreateRequirements][HERE)
source /opt/atlas/software/i686_slc5_gcc43_opt/16.0.0/cmtsite/setup.sh -tag=16.0.0,AtlasOffline,32,opt,oneTest,setup
For patched versions, an example is
source /opt/atlas/software/i686_slc5_gcc43_opt/16.0.0/cmtsite/setup.sh -tag=16.0.0.1,AtlasProduction,32,opt,oneTest,setup
(note the directory of the setup.sh is in the main release version. Also note that the “tag” options are somewhat different for base and patched version)
Database access needed for Athena jobs
You may need the following definition in addition to run some type of jobs. (They define how to access conditions files and database)
export FRONTIER_SERVER="(proxyurl=http://vmsquid.hep.anl.gov:3128)(serverurl=http://squid-frontier.usatlas.bnl.gov:23128/frontieratbnl)"
In this, “vmsquid.hep.anl.gov” is specific to the ANL ASC cluster. Ask your administrator for the name of your local squid server.
(For recent versions of Atlas Local Root Base, it is no longer necessary to define the following: “export ATLAS_POOLCOND_PATH=/opt/atlas/conditions/poolcond/catalogue”)
Sometimes, a job will require a specific recent Database release which is not shipped with Athena versions. If this is the case, it is possible to access these which are installed on CVMFS. To see which versions of the database are available:
[ryoshida@ascint1y ~]$ ls /opt/atlas/database/DBRelease 9.6.1 9.7.1 9.8.1 9.9.1 current
If you want to use these instead (9.6.1 in this example) of the ones built into the athena version, give the following commands.
export DBRELEASE_INSTALLDIR="/opt/atlas/database" export DBRELEASE_VERSION="9.6.1" export ATLAS_DB_AREA=${DBRELEASE_INSTALLDIR} export DBRELEASE_OVERRIDE=${DBRELEASE_VERSION}
More information on database releases are HERE.
Accessing SVN code repository at CERN
In order to check out packages from CERN SVN using commands like “cmt co”. you need to do a Kerberos authentication. If your local user_name is not the same as at CERN (your lxplus account), you will need to create a file called.
~/.ssh/config
This file should contain the following.
Host svn.cern.ch User your_cern_username GSSAPIAuthentication yes GSSAPIDelegateCredentials yes Protocol 2 ForwardX11 no
Then you can give the commands (after setting up an Athena version)
kinit [email protected] ! give lxplus password export SVNROOT=svn+ssh://svn.cern.ch/reps/atlasoff
and you will have access to the svn repository at CERN.
If your usernames are the same, you only need to do the kinit command.
Running Athena
At this stage, you are set up so that examples in the Atlas computing workbook should work. (But skip the “setting up your account” section–you have done the equivalent already). Also examples from Physics Analysis Workbook should work. The following is a small example to get you started.
(Almost) Athena-Version independent !HelloWorld example
- Set up the Athena environment as above.
- Go to your test area, e.g. ~/testarea/RELEASE (e.g. 15.6.6)
cmt show versions PhysicsAnalysis/AnalysisCommon/UserAnalysis
- will return a string containing a “tag collector” number which will look like !UserAnalysis-nn-nn-nn. (For 15.6.6, it is !UserAnalysis-00-14-03)
- Issue the command
cmt co -r UserAnalysis-nn-nn-nn PhysicsAnalysis/AnalysisCommon/UserAnalysis
- This can take a minute or two to complete.
- Go to the run directory
cd PhysicsAnalysis/AnalysisCommon/UserAnalysis/run
- Execute the following command to get the runtime files
get_files -jo HelloWorldOptions.py
- Run athena: issue the command:
athena.py HelloWorldOptions.py
The algorithm will first initialize and will then run ten times (during each run it will print various messages and echo the values given in the job options file). Then it will finalize, and stop. You should see something that includes this:
HelloWorld INFO initialize() HelloWorld INFO MyInt = 42 HelloWorld INFO MyBool = 1 HelloWorld INFO MyDouble = 3.14159 HelloWorld INFO MyStringVec[0] = Welcome HelloWorld INFO MyStringVec[1] = to HelloWorld INFO MyStringVec[2] = Athena HelloWorld INFO MyStringVec[3] = Framework HelloWorld INFO MyStringVec[4] = Tutorial
If so you have successfully run Athena !HelloWorld.
Getting sample data and MC files with DQ2
After doing
setupATLAS
give the command:
localSetupDQ2Client
You'll get a banner
************************************************************************ It is strongly recommended that you run DQ2 in a new session It may use a different version of python from Athena. ************************************************************************ Continue ? (yes[no]) :
Say “yes”. It's safest to dedicate a window for DQ2, or log out and in after using DQ2 if you want to use Athena.
The usage and documentation on DQ2 tools are HERE.
Submitting to the Grid using pathena
Your Grid Certificates
As usual, you need to copy your certificates userkey.pem and usercert.pem in the ~/.globus area. Instruction on obtaining certificates (for US users) is HERE.
Setting up for Pathena
After setting up for Athena as described above give the following command:
localSetupPandaClient
Using Pathena to submit to the Gri
After the above setup you can follow the general Distributed Analysis on Panda Instructions HERE. Skip the Setup section in the document since you have done this already for T3g.
Local Batch Cluster
Your Tier3 will have local batch queues you can use to run over larger amount of data. In general one batch queue is more or less equivalent to one analysis slot at a Tier2 or Tier1. As an example, ANL ASC cluster has 42 batch queues; this means a job that runs at a Tier1/2 analysis queues in an hour, split to 42 jobs, will also run in about an hour at ANL ASC (assuming you are using all of the queues).
<!– The basic batch system that is running is Condor which can be used on it's own as described https://atlaswww.hep.anl.gov/twiki/bin/view/UsAtlasTier3/Tier3gUsersGuide#Condor][below. However, most will probably want to take advantage of some type of interface to make this easier. These are
- Pathena: This is the same Pathena as for the Grid adapted for T3g usage.
- !ArCond: This is a local interface to condor developed at Argonne. This will parallel-ize your jobs and submit them to cause least amount of network traffic; which will be a significant advantage particularly for i/o bound jobs. Pathena will also have this capability in the future.
–>
Using pathena to submit to your local batch cluster
This is still under preliminary tests.
The batch nodes of your cluster may be configured so that it can accept Pathena jobs in a similar way to Tier2 and Tier1 analysis queues. The description of Tier3 Panda is HERE.
The name of the Panda site at ASC ANL is ANALY_ANLASC. Pathena submission with option –site=ANALY_ANLASC will submit jobs to the Condor queues described below.
T3g is not part of the Grid and the data storage you have locally is not visible to Panda.
- You are still communicating with Panda server at CERN. This means you will need to set up exactly as you would for any Pathena submission.
- You will need to specify the files you want to run on in a file and send it with your job.
- Panda will not retrieve your job and register it with dq2. Your output will reside locally.
- Normally, T3g queues will be set up so that only local T3g users can submit to its batch queues.
The following is an example of a command submitting a job to ANALY_ANLASC. This submitted the Analysis Skeleton example from the computing workbook.
pathena --site=ANALY_ANLASC --pfnList=my_filelist --outDS=user10.RikutaroYoshida.t3test.26Mar.v0 AnalysisSkeleton_topOptions.py
Note that everything is the same as usual except that ANALY_ANLASC is chosen as the site and the input is specified by a file called my_filelist.
[ryoshida@ascint1y run]$ cat my_filelist root://ascvmxrdr.hep.anl.gov//xrootd/mc08.105597.Pythia_Zprime_tt3000.recon.AOD.e435_s462_s520_r808_tid091860/AOD.091860._000003.pool.root.1
This file specifies the file you want use as input in the format which can be understood by the local system. The local data storage for you batch jobs is explained below under XRootD.
The output of the jobs cannot be registered with DQ2 as with the T2 or T1, but is kept locally in the area.
/export/share/data/users/atlasadmin/2010/YourGridId
. For example:
[ryoshida@ascint1y run]$ ls /export/share/data/users/atlasadmin/2010/RikutaroYoshida/ user10.RikutaroYoshida.t3test.22Mar.v0 user10.RikutaroYoshida.t3test.25Mar.v0 user10.RikutaroYoshida.t3test.22Mar.v1 user10.RikutaroYoshida.t3test.25Mar.v0_sub06230727 user10.RikutaroYoshida.t3test.22Mar.v1_sub06183810 user10.RikutaroYoshida.t3test.26Mar.v0 user10.RikutaroYoshida.t3test.22Mar.v2 user10.RikutaroYoshida.t3test.26Mar.v0_sub06253711 user10.RikutaroYoshida.t3test.22Mar.v2_sub06183821
Looking at the output of the job submitted by the command above:
ls /export/share/data/users/atlasadmin/2010/RikutaroYoshida/user10.RikutaroYoshida.t3test.26Mar.v0_sub06253711 user10.RikutaroYoshida.t3test.26Mar.v0.AANT._00001.root user10.RikutaroYoshida.t3test.26Mar.v0._1055967261.log.tgz
The output root file and the zipped log files can be seen.
Local parallel processing on your batch cluster: ArCond
ArCond (Argonne Condor) is a wrapper around condor which allows you to automatically parallelize your jobs (athena and non-athena) with input data files and helps you to concatenate your output at the end. Unlike pathena, this is a completely local submission system.
To start with !ArCond do the following
mkdir arctest cd arctest source /export/home/atlasadmin/condor/Arcond/etc/arcond/arcond_setup.sh arc_setup
This will set up your arctest directory
[ryoshida@ascint1y arctest]$ arc_setup Current directory=/users/ryoshida/asc/arctest --- initialization is done --- [ryoshida@ascint1y arctest]$ ls DataCollector Job arcond.conf example.sh patterns user
Now do
arc_ls /xrootd/
to see the files loaded into the batch area. You are now set up to run a test job from the arctest directory with the “arcond” command. We're working on a more complete description. However https://atlaswww.hep.anl.gov/twiki/bin/view/Workbook/UsingPCF][these pages although it describes an installation on a different cluster, contains most of the needed information.
Condor
The batch system being used by ANL ASC is Condor. There are detailed documentation in the link. The information here is meant for you to be able to look at the system and to do simple submissions. We believe you will do most of your job submission using interfaces which are provided (pathena and !ArCond: see above) so that parallel processing of the data is automated.
There is no user setup necessary for using Condor.
Looking at Condor queues
To see what queues are there, give the following command.
[test_user@ascint1y ~]$ condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime [email protected] LINUX X86_64 Unclaimed Idle 0.000 2411 1+19:38:32 [email protected] LINUX X86_64 Unclaimed Idle 0.000 2411 1+19:42:55 ...(abbreviated) [email protected] LINUX X86_64 Claimed Busy 0.000 2411 0+00:00:05 [email protected] LINUX X86_64 Claimed Busy 0.010 2411 0+00:00:05 [email protected] LINUX X86_64 Claimed Busy 0.010 2411 0+00:00:06 [email protected] LINUX X86_64 Claimed Busy 0.010 2411 0+00:00:07 [email protected] LINUX X86_64 Claimed Busy 0.010 2411 0+00:00:08 [email protected] LINUX X86_64 Claimed Busy 0.010 2411 0+00:00:09 ...(abbreviated) [email protected]. LINUX X86_64 Unclaimed Idle 0.000 2411 0+14:10:17 [email protected]. LINUX X86_64 Unclaimed Idle 0.000 2411 0+14:10:18 [email protected]. LINUX X86_64 Unclaimed Idle 0.000 2411 0+14:10:11 [email protected]. LINUX X86_64 Unclaimed Idle 0.000 2411 0+14:10:12 Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 45 3 14 28 0 0 0 Total 45 3 14 28 0 0 0
This tells you that there are 45 queues (3 reserved for service jobs) and the status of each queue. Note that queues (slots) in ascwrk1 nodes are running a job (busy). To see the queues themselves:
[test_user@ascint1y ~]$ condor_q -global -- Submitter: ascint1y.hep.anl.gov : <146.139.33.41:9779> : ascint1y.hep.anl.gov ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 76139.0 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.1 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.2 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.3 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.4 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.5 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.6 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.7 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.8 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.9 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.10 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.11 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.12 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 76139.13 test_user 3/18 10:53 0+00:00:00 I 20 0.0 run_athena_v2_1.sh 14 jobs; 14 idle, 0 running, 0 held
In this case 14 jobs from the user “test_user” is in the idle state (just before it begins to run)
Submitting a job to Condor
Prepare your submission file. An example is the following
[test_user@ascint1y xrd_rdr_access_local]$ less run_athena_v2.sub # Some incantation.. universe = vanilla # This is the actual shell script that runs executable = /export/home/test_user/condor/athena_test/xrd_rdr_access_local/run_athena_v2.sh # The job keeps the environmental variables of the shell from which you submit getenv = True # Setting the priority high Priority = +20 # Specifies the type of machine. Requirements = ( (Arch == "INTEL" || Arch == "X86_64")) # You can also specify the node on which it runs, if you want #Requirements = ( (Arch == "INTEL" || Arch == "X86_64") && Machine == "ascwrk2.hep.anl.gov") # The following files will be written out in the directory from which you submit the job log = test_v2.$(Cluster).$(Process).log # The next two will be written out at the end of the job; they are stdout and stderr output = test_v2.$(Cluster).$(Process).out error = test_v2.$(Cluster).$(Process).err # Ask that you transfer any file that you create in the "top directory" of the job should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT # queue the job. queue 1 # more than once if you want # queue 14
The actual shell script that executes look like this:
[test_user@ascint1y xrd_rdr_access_local]$ less run_athena_v2.sh #!/bin/bash ## You will need this bit for every Athena job # non-interactive shell doesn't do aliases by default. Set it so it does shopt -s expand_aliases # set up the aliases. # note that ATLAS_LOCAL_ROOT_BASE (unlike the aliases) is passed the shell from where you are submitting. alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' # now proceed normally to set up the other aliases setupATLAS # Condor works in a "sandbox" directory. # We now want to create our Athena environment in this sandbox area. mkdir testarea mkdir testarea/15.6.3 export ATLAS_TEST_AREA=${PWD}/testarea/15.6.3 localSetupGcc --gccVersion=gcc432_x86_64_slc5 cd testarea/15.6.3 # Set up the Athena version version source /export/home/atlasadmin/temp/setupScripts/setupAtlasProduction_15.6.3.sh # For this example, just copy the code from my interactive work area where I have the code running. cp -r ~/cvmfs2test/15.6.3/NtupleMaker . # compile the code cd NtupleMaker/cmt cmt config gmake source setup.sh # cd to the run area and start running. cd ../share athena Analysis_data900GeV.py # Just to see what we have at the end do an ls. This will end up in the *.out file echo "ls -ltr" ls -ltr # copy the output file back up to the top directory to get it back from CONDOR into you submission directory. cp Analysis.root ../../../../Analysis.root
Now to submit the job do
condor_submit run_athena_v2.sub
Data Storage at your site
The baseline Tier 3g configuration has several data storage options. The interactive nodes can be configurated to have some local space. This space should be considered shared scratch space. Local site policies will define how this space will be used. There is also space located in the standalone file server (also know as the nfs node). Due to limitations of nfs within Scientific Linux, XrootD is used to access the data on this node. In the baseline Tier 3g setup, the majority of storage is located on the worker nodes. This storage space is XrootD managed and accessed space.