Resource manager integration

This page reports about the configuration to be performed in order to interact with the cluster resource manager and the setup to define the user able to submit the commands.

Note

Before proceeding with the following steps, check that you have the required pre-requisites installed. In particular the resource manager should be installed and running. Additionally, make sure the Ophidia server source code or binary package is available on the system where the resource managere integration is being performed.

Main configuration

The main server configuration file is located in $prefix/etc/rmanager.conf. This file and the related scripts saved in $prefix/etc/script/ have to be adapted according to the scheduler (resource manager) adopted for job submission. In case of Slurm the following configuration should be adequate.

SUBM_CMD_TO_SUBMIT=/usr/local/ophidia/oph-server/etc/script/oph_submit.sh
SUBM_CMD_TO_START=/usr/local/ophidia/oph-server/etc/script/oph_start.sh
SUBM_CMD_TO_CANCEL=/usr/local/ophidia/oph-server/etc/script/oph_cancel.sh
SUBM_CMD_TO_STOP=/usr/local/ophidia/oph-server/etc/script/oph_cancel.sh
SUBM_CMD_TO_COUNT=/usr/local/ophidia/oph-server/etc/script/oph_count.sh
SUBM_CMD_TO_CHECK=/usr/local/ophidia/oph-server/etc/script/oph_check.sh
SUBM_MULTIUSER=no
SUBM_GROUP=ophidia
SUBM_QUEUE_HIGH=ophidia
SUBM_QUEUE_LOW=ophidia
SUBM_PREFIX=
SUBM_POSTFIX=>/dev/null 2>&1 </dev/null

Description of the parameters (lines can be commented using #):

SUBM_CMD_TO_SUBMIT ← command or script used to submit Ophidia operators
SUBM_CMD_TO_START  ← command or script used to start cluster deployment in case dynamic deployment is enabled
SUBM_CMD_TO_CANCEL ← command or script used to stop the execution of an Ophidia operator
SUBM_CMD_TO_STOP   ← command or script used to stop a cluster already deployed
SUBM_CMD_TO_COUNT  ← command or script used to retrieve the number of hosts available to deploy a new cluster
SUBM_CMD_TO_CHECK  ← command or script used to retrieve job status at the resource manager in case POLL_TIME > 0
SUBM_MULTIUSER     ← set to "yes" to enable multi-user mode (default "no")
SUBM_GROUP         ← Linux group of Ophidia users in multi-user mode
SUBM_QUEUE_HIGH    ← name of high-priority queue, used for serial jobs
SUBM_QUEUE_LOW     ← name of low-priority queue, used for parallel jobs
SUBM_PREFIX        ← prefix added to submission strings
SUBM_POSTFIX       ← postfix appended to submission strings

The scripts provided by the package (in etc/script/) was developed under the assumption that Slurm is used as resource manager and the related tools (for instance srun, scancel and squeue) are saved in /usr/local/ophidia/extra/bin. Adapt them accordingly following th examples reported in next sections.

A new configuration file could be created in etc/rms/ as long as the symbolic link rmanager.conf is changed accordingly and the package is reinstalled manually.

Note

Starting from v1.4 of Ophidia Server the configuration of resource manager has been drastically changed as different parameters needed to be set. See the v1.3.0 documentation for further details.

Multi-user configuration

By default Ophidia Server submits any command (by starting the execution of a script) as the Linux user identified by SUBM_USER (single-user mode), considering Linux privilegies associated with that user.

Consider to set SUBM_MULTIUSER to “yes” if Ophidia platform is exploited in a cluster environment. In this scenario file access could require specific privilegies: for instance, the access to a file exported by an user (see OPH_EXPORTNC) could be denied to another user. In case multi-user mode is adopted the server requests the resource manager to use Linux credentials of the user sending each command: OS user identifier is equal to the value of parameter OPH_OS_USERNAME and can be modified by means of the tool oph_manage_user or editing user configuration as explained here.

In particular, by adopting multi-user mode (and considering the scripts set in previous configuration file) any Ophidia operator would be submitted as follows

sudo -u <user> /usr/local/ophidia/oph-server/etc/script/oph_submit.sh ... parameters ...

where <user> is the Linux user associated with the Ophidia user that submits the operator. The command is actually executed by the user identified by the parameter SUBM_USER in server configuration file, hence this Linux user needs to be enabled to run the scripts in privileged mode. To this the following rules need to be appended in /etc/sudoers (e.g. using visudo)

Defaults:ophidia !requiretty
ophidia     ALL=(%ophidia) NOPASSWD: /usr/local/ophidia/oph-server/etc/script/oph_submit.sh
ophidia     ALL=(%ophidia) NOPASSWD: /usr/local/ophidia/oph-server/etc/script/oph_start.sh
ophidia     ALL=(%ophidia) NOPASSWD: /usr/local/ophidia/oph-server/etc/script/oph_cancel.sh
ophidia     ALL=(%ophidia) NOPASSWD: /usr/local/ophidia/oph-server/etc/script/oph_count.sh
ophidia     ALL=(%ophidia) NOPASSWD: /usr/local/ophidia/oph-server/etc/script/oph_check.sh

where it is assumed that SUBM_GROUP has been set to “ophidia”.

Note that all the Linux users enabled to submit Ophidia operators have to belong to the Linux group identified by the parameter SUBM_GROUP (set to “ophidia” by default) making easy their management.

In case of Slurm the last rule can be omitted as the command squeue exploited by the script (see below for additinal details) can return the complete job list by default without adopting privileged mode.

Scripts configuration

The following scripts can be setup:

  1. Submit operators
  2. Deploy a new cluster
  3. Stop an operator
  4. Undeploy a running operator
  5. Retrieve the number of available hosts
  6. Check status the resource manager queue

Submit Ophidia operators

The command adopted to submit an Ophidia operator is defined in the parameter SUBM_CMD_TO_SUBMIT and, by default, it corresponds to the script $prefix/etc/script/oph_submit.sh shown below.

Input parameters of this script are:

  • identifier of the task to be submitted from Ophidia Server point-of-view (it is a number forged by Ophidia Server)
  • number of cores to be used in executing the task
  • file to be used as stdout and stderr for the task
  • argument list of the operator (key-value pairs separated by ;)
  • name of the queue where the task has to be enqueued for execution
  • identifier of the server instance

The script builds an executable file by exploiting the input submission string and saves it within $HOME folder of the submitter (i.e. the Linux user that submits the operator) over each node of the cluster and starts its execution. At the end of the operation the executable file is deleted from each node of the cluster.

A script that can be effectively used with Slurm is the following.

#!/bin/bash

# Input parameters
taskid=${1}
ncores=${2}
log=${3}
submissionstring=${4}
queue=${5}
serverid=${6}

# Const
fixString=randomString
FRAMEWORK_PATH=/usr/local/ophidia/oph-cluster/oph-analytics-framework
LAUNCHER=/usr/local/ophidia/extra/bin/srun

# Body
mkdir -p ${HOME}/.ophidia
> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
echo "#!/bin/bash" >> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
echo "${FRAMEWORK_PATH}/bin/oph_analytics_framework \"${submissionstring}\"" >> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
chmod +x ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
${LAUNCHER} --mpi=pmi2 --input=none -n ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
if [ $? -ne 0 ]; then
      echo "Unable to submit ${HOME}/.ophidia/${serverid}${taskid}.submit.sh"
      exit -1
fi
rm ${HOME}/.ophidia/${serverid}${taskid}.submit.sh

exit 0

Here it is assumed that $HOME folder of the submitter is shared among all the nodes of the cluster (e.g. a shared file system is adopted), so that the executable file ${HOME}/.ophidia/${serverid}${taskid}.submit.sh can be accessed by any node. Then, MPI processes started by Slurm are associated with the same program ${HOME}/.ophidia/${serverid}${taskid}.submit.sh.

Note

In case the $HOME folder is not shared, you should specify a different shared folder where the .ophidia directory and the scripts will be created.

Additional details:

  • fixString is an optional prefix used to recognize Ophidia tasks; set it to an unusual value for job names (e.g. a random string); set it to the same value for all the scripts of this section
  • FRAMEWORK_PATH points to Ophidia Framework executable
  • LAUNCHER points to the command used to submit tasks to resource manager: srun is related to Slurm; change the value according to the proper resource manager
  • actual task name from resource manager point-of-view is ${fixString}${serverid}${taskid}
  • a hidden folder named .ophidia is created to avoid to replace possible homonym files in HOME folder

An example of script that can be adopted with LSF is the following.

#!/bin/bash

export LSF_SERVERDIR=...
export LSF_LIBDIR=...
export LSF_BINDIR=...
export LSF_ENVDIR=...
export PATH=...
export LD_LIBRARY_PATH=...

# Input arguments
taskid=${1}
ncores=${2}
log=${3}
submissionstring=${4}
queue=${5}
serverid=${6}

# Const
fixString=randomString
FRAMEWORK_PATH=/usr/local/ophidia/oph-cluster/oph-analytics-framework
LAUNCHER=<path-to-bsub>/bsub

# Body
mkdir -p ${HOME}/.ophidia
> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
echo "${FRAMEWORK_PATH}/bin/oph_analytics_framework \"$submissionstring\"" >> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
chmod 711 ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
${LAUNCHER} -q ${queue} -n ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} -Ep "rm ${HOME}/.ophidia/${taskid}.submit.sh" mpirun.lsf ${HOME}/.ophidia/${serverid}${taskid}.submit.sh

exit 0

Note the reference to bsub as launcher and the related changes to options of ${LAUNCHER}. Environmental variables listed on top the scripts need to be set appropriately to run Ophidia operators.

Deploy new cluster

The command adopted to start cluster deployment is set in the parameter SUBM_CMD_TO_START and, by default, it corresponds to the script $prefix/etc/script/oph_start.sh shown below. This command can be used only in case dynamic cluster deployment is activated (ENABLE_CLUSTER_DEPLOYMENT set to “yes” in server configuration).

Input parameters are:

  • identifier of the task to be submitted from Ophidia Server point-of-view (it is a number forged by Ophidia Server)
  • number of cores to be used in executing the task
  • file to be used as stdout and stderr for the task
  • identifier of the host partition; it is numeric and corresponds to input argument of OPH_CLUSTER: the operator inserts the entry related to host partition into OphidiaDB (before running the script) and uses the identifier as input argument of the script.
  • name of the queue where the task has to be enqueued for execution
  • identifier of the server instance

The script builds an executable file (accessible from all the nodes of the cluster) that, in turn, starts an instance of I/O server: as the script is executed as a MPI program, more instances are started (one per node). The executable file is saved within $HOME folder of submitter, it is executed and, at the end, is deleted.

The following script is provided as an example.

#!/bin/bash

# Input parameters
taskid=${1}
ncores=${2}
log=${3}
hostpartition=${4}
queue=${5}
serverid=${6}

# Const
fixString=randomString
LAUNCHER=/usr/local/ophidia/extra/bin/srun
IO_SERVER_LAUNCHER=/usr/local/ophidia/oph-cluster/oph-io-server/etc/start_ioserver.sh

# Body
mkdir -p ${HOME}/.ophidia
> ${HOME}/.ophidia/${serverid}${taskid}.start.sh
echo "#!/bin/bash" >> ${HOME}/.ophidia/${serverid}${taskid}.start.sh
echo "${IO_SERVER_LAUNCHER} ${hostpartition}" >> ${HOME}/.ophidia/${serverid}${taskid}.start.sh
chmod +x ${HOME}/.ophidia/${serverid}${taskid}.start.sh
${LAUNCHER} --mpi=pmi2 --input=none --exclusive --ntasks-per-node=1 -N ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} ${HOME}/.ophidia/${serverid}${taskid}.start.sh
if [ $? -ne 0 ]; then
      echo "Unable to submit ${HOME}/.ophidia/${serverid}${taskid}.start.sh"
      exit -1
fi
rm ${HOME}/.ophidia/${serverid}${taskid}.start.sh

exit 0

Here it is assumed that $HOME folder of the submitter is shared among all the nodes of the cluster (e.g. a shared file system is adopted) so that the executable file ${HOME}/.ophidia/${serverid}${taskid}.start.sh can be accessed by any nodes. Then, MPI processes started by Slurm are associated with the same program ${HOME}/.ophidia/${serverid}${taskid}.start.sh.

Note

In case the $HOME folder is not shared, you should specify a different shared folder where the .ophidia directory and the scripts will be created.

The use of options –exclusive and –ntasks-per-node=1 allow to start only one MPI process for each node so that the number of I/O server instances is equal to ncores.

Additional details:

  • fixString is an optional prefix used to recognize Ophidia tasks; set it to an unusual value for job names (e.g. a random string); set it to the same value for all the scripts of this section
  • LAUNCHER points to the command used to submit tasks to resource manager: srun is related to Slurm; change the value according to the proper resource manager
  • IO_SERVER_LAUNCHER points to the actual script used to activate I/O server instances (see next sub-section below for further details)
  • actual task name from resource manager point-of-view is ${fixString}${serverid}${taskid}
  • a hidden folder named .ophidia is created to avoid to replace possible homonym files in HOME folder

The following version of the script is adequate for LSF, provided that environmental variables are set accordingly.

#!/bin/bash

export LSF_SERVERDIR=...
export LSF_LIBDIR=...
export LSF_BINDIR=...
export LSF_ENVDIR=...
export PATH=...
export LD_LIBRARY_PATH=...

# Input arguments
taskid=${1}
ncores=${2}
log=${3}
hostpartition=${4}
queue=${5}
serverid=${6}

# Const
fixString=randomString
LAUNCHER=<path-to-bsub>/bsub
IO_SERVER_LAUNCHER=/usr/local/ophidia/oph-cluster/oph-io-server/etc/start_ioserver.sh

# Body
echo "${IO_SERVER_LAUNCHER} ${hostpartition}" >> $HOME/.ophidia/${serverid}${taskid}.submit.sh
chmod 711 ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
{LAUNCHER} -q ${queue} -x -R "span[ptile=1]" -I -n ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} -Ep "rm ${HOME}/.ophidia/${serverid}${taskid}.submit.sh" blaunch ${HOME}/.ophidia/${taskid}.submit.sh

exit 0

$prefix/etc/script/start_ioserver.sh

The script runs over each node where a new istance of I/O server needs to be activated. It is called by $prefix/etc/script/oph_start.sh and executed as MPI process. Input argument is the numeric identifier of host partition associated with the dynamic cluster.

The script consists of three parts:

  • host reservation in OphidiaDB
  • actual deploy (execution of an instance of I/O server)
  • host release in OphidiaDB

The first step is needed to update host status in OphidiaDB: after this update the node where the script is executed is considered “reserved” and it cannot be assigned to other user-defined partition (or other clusters as well). As a consequence only one instance of I/O server will run over the node.

The second step is the actual execution of I/O server instance. The script is blocked on this execution until I/O server is shut down.

The third step provided to reset host status in OphidiaDB: after this update the node where the script is executed is released and can be used for other dynamic clusters.

The following version of the script is adequate for any resource manager.

#!/bin/bash

# Input parameters
hpid=${1}

# Const
OPHDB_NAME=ophidiadb
OPHDB_HOST=127.0.0.1
OPHDB_PORT=3306
OPHDB_LOGIN=root
OPHDB_PWD=abcd
IO_SERVER_PATH=/usr/local/ophidia/oph-cluster/oph-io-server/bin/oph_io_server
IO_SERVER_TEMPLATE=/usr/local/ophidia/oph-cluster/oph-io-server/etc/oph_ioserver.conf.template

# Body
hostlist=`hostname --all-fqdns`
myhost=`echo ${hostlist} | tail -c 7`
myid=`echo ${hostlist} | tail -c 2 | bc`

echo "Add host ${myhost} to partition ${hpid}"
mysql -u ${OPHDB_LOGIN} -p${OPHDB_PWD} -h ${OPHDB_HOST} -P ${OPHDB_PORT} ${OPHDB_NAME} -e "START TRANSACTION; UPDATE host SET status = 'up' WHERE hostname = '${myhost}'; INSERT INTO hashost(idhostpartition, idhost) VALUES (${hpid}, (SELECT idhost FROM host WHERE hostname = '${myhost}')); COMMIT;"
if [ $? -ne 0 ]; then
      echo "Query failed"
      exit -1
fi
echo "OphidiaDB updated"

rm -rf ${HOME}/.ophidia/data${myid}/*
mkdir -p ${HOME}/.ophidia/data${myid}/{var,log}

cp -f ${IO_SERVER_TEMPLATE} ${HOME}/.ophidia/data${myid}/oph_ioserver.conf
sed -i "s|\$HOME|${HOME}|g" ${HOME}/.ophidia/data${myid}/oph_ioserver.conf

echo "Starting I/O server ${myid}"
${IO_SERVER_PATH} -i ${myid} -c ${HOME}/.ophidia/data${myid}/oph_ioserver.conf > ${HOME}/.ophidia/data${myid}/log/server.log 2>&1 < /dev/null
echo "Exit from IO server ${myid}"

echo "Remove host ${myhost} from partition ${hpid}"
mysql -u ${OPHDB_LOGIN} -p${OPHDB_PWD} -h ${OPHDB_HOST} -P ${OPHDB_PORT} ${OPHDB_NAME} -e "START TRANSACTION; UPDATE host SET status = 'down', importcount = 0 WHERE hostname='${myhost}'; DELETE FROM hashost WHERE idhostpartition = ${hpid} AND idhost IN (SELECT idhost FROM host WHERE hostname = '${myhost}'); COMMIT;"
if [ $? -ne 0 ]; then
      echo "Query failed"
      exit -1
fi
echo "OphidiaDB updated"

rm -rf ${HOME}/.ophidia/data${myid}/*

exit 0

Of course, access parameters to OphidiaDB need to be updated as reported in server configuration for $prefix/etc/ophidiadb.conf.

Note that lines used for retrieving the hostname and/or the host identifier (myhost and myid) must be adapted according to the specific HPC environment. In the example provided it is assumed that hostnames are

oph-01
oph-02
oph-03
...

hence the the variables myhost and myid are set as follows

*myhost*      *myid*
oph-01        1
oph-02        2
oph-03        3
...

Before running the I/O server instance, the script builds a working area in $HOME folder of the submitter (Linux user). In the example proposed above, it corresponds to folder ${HOME}/.ophidia/data${myid} and includes:

  • I/O server configuration file
  • two folders, named var and log, used by I/O server to save internal metadata.

The former is a copy of IO_SERVER_TEMPLATE, which is customized for the specific submitter at run-time. In particular, the template (shown below) includes the list of configuration parameters for all the possible I/O server instances (one for each analytics node) as reported in the multiple-instance configuration. Contrary to a static configuration of I/O server instances, the parameter SERVER_DIR has to be automatically set to a sub-folder of $HOME folder of the specific submitter at run-time, thus avoiding to write metadata in a folder accessible by any other user. In the proposed example, the parameter is set to $HOME/.ophidia/data1 for the first instance, to $HOME/.ophidia/data2 for the second instance and so on. The script start_ioserver.sh replaces any occurrence if the string “$HOME” found in the copy of IO_SERVER_TEMPLATE with the path to $HOME folder of the submitter and, hence, metadata will be saved in a folder accessible only by the submitter.

[instance1]
SERVER_HOSTNAME=192.168.0.1
SERVER_PORT=65000
SERVER_DIR=$HOME/.ophidia/data1
MAX_PACKET_LEN=4000000
CLIENT_TTL=300
MEMORY_BUFFER=1024

[instance2]
SERVER_HOSTNAME=192.168.0.2
SERVER_PORT=65000
SERVER_DIR=$HOME/.ophidia/data2
MAX_PACKET_LEN=4000000
CLIENT_TTL=300
MEMORY_BUFFER=1024

...

Additional details:

  • OPHDB_NAME is the name of OphidiaDB at MySQL management system
  • OPHDB_HOST is the hostname or IP address of the node where MySQL management system is running
  • OPHDB_PORT is the port number of MySQL management system
  • OPHDB_LOGIN is the username used to access OphidiaDB
  • OPHDB_PWD is the password used to access OphidiaDB
  • IO_SERVER_PATH points to Ophidia I/O Server executable
  • a hidden folder named .ophidia is created to avoid to replace possible homonym files in $HOME folder
  • host partition identified by ${hpid} is created before running the script by OPH_CLUSTER.

Stop Ophidia operators

The command adopted to stop the execution of an Ophidia operator is specified through the parameter SUBM_CMD_TO_CANCEL and, by default, it corresponds to the script $prefix/etc/script/oph_cancel.sh shown below.

The following version is adequate for Slurm (it should be updated according to the proper resource manager).

#!/bin/bash

# Input parameters
taskid=${1}
serverid=${2}

# Const
fixString=randomString
KILLER=/usr/local/ophidia/extra/bin/scancel

# Body
${KILLER} -n ${fixString}${serverid}${taskid}

exit $?

The following version is adequate for LSF (adapting path and environmental variables accordingly).

#!/bin/bash

export LSF_SERVERDIR=...
export LSF_LIBDIR=...
export LSF_BINDIR=...
export LSF_ENVDIR=...
export PATH=...

# Input arguments
taskid=${1}
serverid=${2}

# Const
fixString=randomString
KILLER=<path-to-bkill>/bkill

# Body
${KILLER} -J ${fixString}${serverid}${taskid}

exit 0

Undeploy a cluster

The command adopted to stop I/O servers and, then, undeploy the cluster is set in the parameter SUBM_CMD_TO_STOP and, by default, it corresponds to the script $prefix/etc/script/oph_cancel.sh (the same used to stop any other Ophidia operator). See previous section for further information.

Retrieve the number of available hosts (to deploy a new cluster)

The command adopted to retrieve the number of available hosts is set in the parameter SUBM_CMD_TO_COUNT and, by default, it corresponds to the script $prefix/etc/script/oph_count.sh shown below.

#!/bin/bash

# Input parameters
WORK_FILE=${1}

# Const
OPHDB_NAME=ophidiadb
OPHDB_HOST=127.0.0.1
OPHDB_PORT=3306
OPHDB_LOGIN=root
OPHDB_PWD=abcd

# Body
COUNT=`mysql -u ${OPHDB_LOGIN} -p${OPHDB_PWD} -h ${OPHDB_HOST} -P ${OPHDB_PORT} ${OPHDB_NAME} -s -N -e "SELECT COUNT(*) FROM host WHERE status = 'down' AND idhost NOT IN (SELECT idhost FROM hashost);" 2> ${WORK_FILE}`
ERROR=`wc -l < ${WORK_FILE}`
if [ $ERROR -gt 1 ]; then
      echo "Query failed"
      exit -1
fi
echo $COUNT > ${WORK_FILE}

exit 0

This script simply retrieves the number of hosts reserved for Ophidia dynamic cluster by sending a query to OphidiaDB. In an HPC environment the resource manager could also be used to submit tasks different from Ophidia operators: in this case the actual number of available host is lower than the number of available hosts reported in OphidiaDB; hence the script has to be adapted in order to retrieve the real number in a different way.

For instance, considering an environment based on LSF where host names are

oph-01
oph-02
oph-03
...

The following command

bhosts | awk '$1 ~ /oph-[0-9]/ && $2 == "ok" && $5 == "0" { print $2 }' | wc -l

returns the actual number of available hosts.

Check status of resource manager queue

The command adopted to retrieve the status of resource manager queue is set in the parameter SUBM_CMD_TO_CHECK and, by default, it corresponds to the script $prefix/etc/script/oph_check.sh shown below.

The following version can be adopted in case the resource manager is Slurm (it should be updated according to the proper resource manager).

#!/bin/bash

# Input parameters

# Const
CHECKER=/usr/local/ophidia/extra/bin/squeue

# Body
${CHECKER} -o "%j %u"

exit $?

The following version could be adeguate in case the resource manager is LSF.

#!/bin/bash

# Input parameters

# Const
CHECKER=<path-to-bjobs>/bjobs

# Body
${CHECKER} | awk '{ print $7, $2 }'

exit $?