asc:tutorials:2014october_connect
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
asc:tutorials:2014october_connect [2014/10/09 19:22] – [Lesson 5: Running a job on multiple cores] asc | asc:tutorials:2014october_connect [2014/10/10 15:28] (current) – [Lesson 6: Using HTCondor and Tier2] asc | ||
---|---|---|---|
Line 202: | Line 202: | ||
</ | </ | ||
- | ====== Lesson 6: Working on the ANL farm ====== | ||
- | This example is ANL-specific. If you ran analysis code for RUN I, you should be a familiar with it. We used [[http:// | + | ====== Lesson 5: Running |
- | <note important> | + | This example is not needed for ATLAS connect. If you still want to know how to run an ATLAS analysis job on several cores of your desktop, |
+ | look at [[asc: | ||
- | First, create a directory for this example: | + | ====== Lesson 6: Using HTCondor and Tier2 ====== |
- | Prepare a fresh directory: | + | Lesson 5: Working on a Tier3 farm (Condor queue) |
- | < | + | |
- | mkdir lesson_6 | + | |
- | </ | + | |
- | Copy the needed directories with RootCore | + | In this example we will use HTCondor workload management system to send the job to be executed in a queue at a Tier3 farm. For this example |
+ | Start from the new shell and set up environment, | ||
- | <code bash> | + | <file bash startJob.sh> |
- | cp -r /users/chakanau/public/2014_tutorial_october_anl/lesson_6/* lesson_6/ | + | #!/bin/bash |
- | </code> | + | export RUCIO_ACCOUNT=YOUR_CERN_USERNAME |
+ | export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase | ||
+ | source ${ATLAS_LOCAL_ROOT_BASE}/user/ | ||
+ | localSetupFAX | ||
+ | source $AtlasSetup/ | ||
+ | export X509_USER_PROXY=x509up_u21183 | ||
+ | unzip payload.zip | ||
+ | ls | ||
+ | rcSetup -u; rcSetup Base, | ||
+ | rc find_packages | ||
+ | rc compile | ||
+ | cd MyAnalysis/ | ||
+ | rm submitDir | ||
+ | |||
+ | echo $1 | ||
+ | sed -n $1, | ||
+ | cp Inp_$1.txt inputdata.txt | ||
+ | cat inputdata.txt | ||
+ | echo " | ||
+ | testRun submitDir | ||
+ | echo " | ||
+ | </file> | ||
- | Then check that you can compile | + | Make sure the RUCIO_ACCOUNT variable is properly set. Make this file executable and create the file that describes our job needs and that we will give to condor: |
- | <code bash> | + | <file bash job.sub> |
- | cd lesson_6/ | + | Jobs=10 |
- | source setup.sh | + | getenv |
- | </code> | + | executable |
+ | output | ||
+ | error = MyAnal_$(Jobs).$(Process).error | ||
+ | log = MyAnal_$(Jobs).$(Process).log | ||
+ | arguments = $(Process) $(Jobs) | ||
+ | environment = " | ||
+ | transfer_input_files = payload.zip,/ | ||
+ | universe | ||
+ | # | ||
+ | queue $(Jobs) | ||
+ | </file> | ||
- | As usual, our analysis | + | To access files using FAX the jobs need a valid grid proxy. That's why we send it with each job. Proxy is the file starting with "x509up" |
- | Now we want to submit jobs to the data distributed on several computers | + | You need to pack all of the working directory into a payload.zip file: |
- | <code bash> | + | <file bash startJob.sh> |
- | cd ../; source s_asc; | + | startJob.sh |
- | </code> | + | rc clean |
+ | rm -rf RootCoreBin | ||
+ | zip -r payload.zip * | ||
+ | </file> | ||
- | We will send jobs using the directory " | + | Now you may submit your task for the execution and follow its status in this way: |
- | <code bash> | ||
- | cd submit | ||
- | arc_ls -s / | ||
- | arc_ls | ||
- | </ | ||
- | |||
- | The first command shows the summary of distributed data (12 files per server), while the second lists all available data on each node. | ||
- | Now we will send the job to the farm. | ||
- | Change the line: | ||
< | < | ||
- | package_dir=/users/chakanau/ | + | chmod 755 ./startJob.sh; |
</ | </ | ||
- | inside " | ||
- | This sends jobs to the farm (2 jobs per server). Check the status as: | ||
<code bash> | <code bash> | ||
- | condor_q | + | ~> condor_submit job.sub |
+ | Submitting job(s).......... | ||
+ | 10 job(s) submitted to cluster 49677. | ||
+ | |||
+ | ~> condor_q | ||
+ | -- Submitter: login.atlas.ci-connect.net : < | ||
+ | | ||
+ | 49677.0 | ||
+ | 49677.1 | ||
+ | 49677.2 | ||
+ | 49677.3 | ||
+ | 49677.4 | ||
+ | 49677.5 | ||
+ | 49677.6 | ||
+ | 49677.7 | ||
+ | 49677.8 | ||
+ | 49677.9 | ||
+ | |||
+ | 10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended | ||
</ | </ | ||
- | |||
- | When the jobs are done, the output files will be inside " | ||
- | |||
- | <code bash> | ||
- | arc_add | ||
- | </ | ||
- | This will create the final output file " | ||
- | |||
- | |||
- |
asc/tutorials/2014october_connect.1412882553.txt.gz · Last modified: 2014/10/09 19:22 by asc