SuperComputing 2001, Denver, Colorado

 
Lab #4:

PBS Installation and Setup

 



Table of Contents
 

  1. Objective
  2. Preparation
  3. Installation of PBS
  4. PBS Configuration
  5. Queue Manager Configuration on Server
  6. Changes made to OpenPBS_2_3_12 Code 
  7. Acknowledgement


Objective

In the previous labs we installed Linux, set up the NFS and NIS for server and clients and installed several software packages. Now we will install PBS (Portable Batch System), a program which controls the submission, scheduling and execution of parallel jobs on your new cluster.



Preparation

You may want to keep the packages you downloaded in a central storage area for later use. One way to do this is to create an archives directory under /usr/local/ and put every original downloaded packages there. You may want to do the same for this part of the lab.


PBS Installation PBS Configuration Queue Manager Configuration on the Server
  1. On the server start the queue manager:

    $ qmgr

    Set the parameters:

    Qmgr: create queue defaultq queue_type=e
    Qmgr: s q defaultq resources_min.cput=1
    Qmgr: s q defaultq resources_max.cput=12:00:00
    Qmgr: s q defaultq resources_default.cput=30:00
    Qmgr: s q defaultq enabled=true
    Qmgr: s q defaultq started=true
    Qmgr: s s defaultq scheduling=true
    Qmgr: s s default_queue=defaultq
    Qmgr: set server managers=user_name@server_name
    Qmgr: set server resources_default.nodect = 1     (This is crucial for exclusive run !)
    Qmgr: set server resources_default.nodes = 1     (This is crucial for exclusive run !)

    Check the server setup:

    Qmgr: print server

    # Create queue and set their attributes
    #
    #
    # Create and define queue defaultq
    #
    create queue defaultq
    set queue defaultq queue_type = Execution
    set queue defaultq resource_max.cput = 12:00:00
    set queue defaultq resource_min.cput = 00:00:01
    set queue defaultq resource_default.cput = 00:30:00
    set queue defaultq enabled = True
    set queue defaultq started = True
    #
    # Set server attributes 
    #
    set server scheduling = True
    set server managers = ypuser@workshop
    set server default_queue = defaultq
    set server log_events = 511
    set server mail_from = adm
    set server resources_default.neednodes = 1
    set server resources_default.nodect = 1
    set server resources_default.nodes = 1
    set server scheduler_iteration = 600

    Qmgr:
    Qmgr: quit

  2. Restart the pbs daemons on the server (you need to be root to do this):

    $ /usr/local/sbin/pbs_mom
    $ /usr/local/sbin/pbs_sched
    $ /usr/local/sbin/pbs_server

  3. You might need to restart the pbs_mom daemon on the clients (need to be root):

    $ /usr/local/sbin/pbs_mom

  4. Check that the daemons can see all nodes and their states are "free":

    $ pbsnodes -a

Changes make to OpenPBS_2_3_12 Code
    The following changes have been made on the PBS code:

    --- src/resmom/linux/mom_mach.c.orig Thu Oct 4 12:43:06 2001

    +++ src/resmom/linux/mom_mach.c Thu Oct 4 12:52:42 2001

    @@ -91,8 +91,8 @@

    #include <pwd.h>

    #include <mntent.h>

    #include <asm/types.h>

    -#include <linux/quota.h>

    -#include <sys/time.h>

    +#include <sys/quota.h>

    +#include <time.h>

    #include <sys/types.h>

    #include <sys/procfs.h>

    #include <sys/param.h>

    --- src/cmds/qsub.c.orig Thu Oct 4 13:13:05 2001

    +++ src/cmds/qsub.c Thu Oct 4 13:13:31 2001

    @@ -191,7 +191,7 @@

    FILE *TMP_FILE;

    char *in;

     

    - tmpnam(tmp_name);

    + mkstemp(tmp_name);

    if ( (TMP_FILE = fopen(tmp_name, "w+")) == NULL ) {

    fprintf(stderr, "qsub: could not create copy of script %s\n",

    tmp_name);

    return(4);

    --- src/gui/Ccode/xpbs_scriptload.c.orig Thu Oct 4 13:14:41 2001

    +++ src/gui/Ccode/xpbs_scriptload.c Thu Oct 4 13:15:10 2001

    @@ -725,7 +725,7 @@

    FILE *TMP_FILE;

    char *in;

     

    - tmpnam(tmp_name);

    + mkstemp(tmp_name);

    if ( (TMP_FILE = fopen(tmp_name, "w+")) == NULL ) {

    fprintf(stderr, "scriptload: could not create buffer file%s\n",

    tmp_name);

    return(4);

    --- src/cmds/qmgr.c.orig Thu Oct 4 13:34:53 2001

    +++ src/cmds/qmgr.c Thu Oct 4 13:35:13 2001

    @@ -2266,7 +2266,7 @@

    char *svr_name;

    struct server *svr;

    {

    - static struct objname temp = { NULL, NULL, NULL, NULL};

    + static struct objname temp = { (int)NULL, (int)NULL, (int)NULL,

    (int)NULL};

     

    if( temp.svr != NULL )

    temp.svr -> ref--;



Acknowledgments


Joint Institute for Computational Science
November, 24th 2001