I downloaded and compiled OpenPBS_2_3_16 instead of using the rpm packages to have scp instead of rcp as the copying mechanism. The source code was untarred in /system/pbs/source and it was built in /system/pbs/build. The instructions said to build the software in a separate directory. So here are the exact commands that were used on cdf3:
1. In /system/pbs/build,
/system/pbs/source/OpenPBS_2_3_16/configure --prefix=/usr --exec-prefix=/usr --with-scp
--set-server-home=/var/spool/pbs --set-default-server=cdf3
2. make
3. make install
4. Create the file /var/spool/pbs/server_priv/nodes and put in the name of each node
that will run jobs. Ours looks like:
cdf18:ts
cdf19:ts
cdf24:ts
cdf25:ts
The following three steps should be carried out on each node that will run jobs.
5. In /system/pbs/build/src/resmom and run:
make install
6. Create the file /var/spool/pbs/mom_priv/config. Its contents are:
$clienthost cdf3
7. Start the pbs_mom daemon
pbs_mom
The nodes (or clients) are now ready to start processing jobs. Go back to the server (cdf3) for the following commands.
8. Start the pbs server pbs_server -t create This is run only once to initialize the database. In the startup script, it will be started with pbs_server -a TRUE
9. Start the pbs scheduler pbs_sched
10. Create a queue to use
qmgr This starts the queue manager
c q cdf queue_type=execution
s s default_queue=cdf
s s acl_hosts=*.uchicago.edu
s s acl_host_enable=true
quit
11. Start and enable queue cdf This only has to be done once
qstart cdf
qenable cdf
That's it, the queue cdf is now setup.
Since cdf3 will actually attempt to login to whatever node is going to run the job, the user who submits the job must have his/her account setup so that they can login without having to enter a password. To do this, do the following on any cdf linux machine running RedHat 7.2.
ssh-keygen accept the default location of ~/.ssh/identity and use a blank password
ssh-keygen -t dsa accept the default location of ~/.ssh/id_dsa and use a blank password
cat ~/.ssh/identity.pub >> authorized_keys
cat ~/.ssh/id_dsa.pub >> authorized_keys2
The first ssh-keygen command generates keys for the ssh1 protocol and the second one generates keys for ssh2. Now you should be able to ssh between RedHat 7.2 machines without having to enter your password.
In order to submit a job to the queue, you must be logged on to cdf3. Then jobs are submitted with the qsub command. This command has LOTS of options and it will in practice be run using a script. An example script, called pbs_test, that just creates a file is the following:
### Job Name Comment
#PBS -N marytest Specifies name of the job
### Output Files Comment
#PBS -e marytest.err Directs standard error to this file
#PBS -o marytest.log Directs standard output to this file
### Queue Name Comment
#PBS -q cdf Specifies the name of the queue
# This job's working directory Comment
echo Working directory is $PBS_O_WORKDIR Prints the directory in which qsub was started
cd $PBS_O_WORKDIR Changes to the directory from which qsub was started
echo Running on host `hostname` Prints the name of the host that ran the job
echo Time is `date` Prints when the job was run
echo Direcory is `pwd` Prints the directory the job was run in
# Run job Comment
`touch xxx` Actual command to run
###End of script Comment
The first thing to note is that all pbs commands start #PBS. These are not comments. If you want to make one of these lines a comment, either change it to ##PBS, # PBS or something else.
This job is submitted with the following command: qsub pbs_test
More documentation to follow...