Submit Ophidia operators
The command adopted to submit an Ophidia operator is defined in the parameter SUBM_CMD_TO_SUBMIT and, by default, it corresponds to the script $prefix/etc/script/oph_submit.sh shown below.
Input parameters of this script are:
- identifier of the task to be submitted from Ophidia Server point-of-view (it is a number forged by Ophidia Server)
- number of cores to be used in executing the task
- file to be used as stdout and stderr for the task
- argument list of the operator (key-value pairs separated by ;)
- name of the queue where the task has to be enqueued for execution
- identifier of the server instance
The script builds an executable file by exploiting the input submission string and saves it within $HOME folder of the submitter (i.e. the Linux user that submits the operator) over each node of the cluster and starts its execution. At the end of the operation the executable file is deleted from each node of the cluster.
A script that can be effectively used with Slurm is the following.
#!/bin/bash
# Input parameters
taskid=${1}
ncores=${2}
log=${3}
submissionstring=${4}
queue=${5}
serverid=${6}
# Const
fixString=randomString
FRAMEWORK_PATH=/usr/local/ophidia/oph-cluster/oph-analytics-framework
LAUNCHER=/usr/local/ophidia/extra/bin/srun
# Body
mkdir -p ${HOME}/.ophidia
> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
echo "#!/bin/bash" >> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
echo "${FRAMEWORK_PATH}/bin/oph_analytics_framework \"${submissionstring}\"" >> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
chmod +x ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
${LAUNCHER} --mpi=pmi2 --input=none -n ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
if [ $? -ne 0 ]; then
echo "Unable to submit ${HOME}/.ophidia/${serverid}${taskid}.submit.sh"
exit -1
fi
rm ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
exit 0
Here it is assumed that $HOME folder of the submitter is shared among all the nodes of the cluster (e.g. a shared file system is adopted), so that the executable file ${HOME}/.ophidia/${serverid}${taskid}.submit.sh can be accessed by any node. Then, MPI processes started by Slurm are associated with the same program ${HOME}/.ophidia/${serverid}${taskid}.submit.sh.
Note
In case the $HOME folder is not shared, you should specify a different shared folder where the .ophidia directory and the scripts will be created.
Additional details:
- fixString is an optional prefix used to recognize Ophidia tasks; set it to an unusual value for job names (e.g. a random string); set it to the same value for all the scripts of this section
- FRAMEWORK_PATH points to Ophidia Framework executable
- LAUNCHER points to the command used to submit tasks to resource manager: srun is related to Slurm; change the value according to the proper resource manager
- actual task name from resource manager point-of-view is ${fixString}${serverid}${taskid}
- a hidden folder named .ophidia is created to avoid to replace possible homonym files in HOME folder
An example of script that can be adopted with LSF is the following.
#!/bin/bash
export LSF_SERVERDIR=...
export LSF_LIBDIR=...
export LSF_BINDIR=...
export LSF_ENVDIR=...
export PATH=...
export LD_LIBRARY_PATH=...
# Input arguments
taskid=${1}
ncores=${2}
log=${3}
submissionstring=${4}
queue=${5}
serverid=${6}
# Const
fixString=randomString
FRAMEWORK_PATH=/usr/local/ophidia/oph-cluster/oph-analytics-framework
LAUNCHER=<path-to-bsub>/bsub
# Body
mkdir -p ${HOME}/.ophidia
> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
echo "${FRAMEWORK_PATH}/bin/oph_analytics_framework \"$submissionstring\"" >> ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
chmod 711 ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
${LAUNCHER} -q ${queue} -n ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} -Ep "rm ${HOME}/.ophidia/${taskid}.submit.sh" mpirun.lsf ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
exit 0
Note the reference to bsub as launcher and the related changes to options of ${LAUNCHER}. Environmental variables listed on top the scripts need to be set appropriately to run Ophidia operators.
Deploy new cluster
The command adopted to start cluster deployment is set in the parameter SUBM_CMD_TO_START and, by default, it corresponds to the script $prefix/etc/script/oph_start.sh shown below. This command can be used only in case dynamic cluster deployment is activated (ENABLE_CLUSTER_DEPLOYMENT set to “yes” in server configuration).
Input parameters are:
- identifier of the task to be submitted from Ophidia Server point-of-view (it is a number forged by Ophidia Server)
- number of cores to be used in executing the task
- file to be used as stdout and stderr for the task
- identifier of the host partition; it is numeric and corresponds to input argument of OPH_CLUSTER: the operator inserts the entry related to host partition into OphidiaDB (before running the script) and uses the identifier as input argument of the script.
- name of the queue where the task has to be enqueued for execution
- identifier of the server instance
The script builds an executable file (accessible from all the nodes of the cluster) that, in turn, starts an instance of I/O server: as the script is executed as a MPI program, more instances are started (one per node). The executable file is saved within $HOME folder of submitter, it is executed and, at the end, is deleted.
The following script is provided as an example.
#!/bin/bash
# Input parameters
taskid=${1}
ncores=${2}
log=${3}
hostpartition=${4}
queue=${5}
serverid=${6}
# Const
fixString=randomString
LAUNCHER=/usr/local/ophidia/extra/bin/srun
IO_SERVER_LAUNCHER=/usr/local/ophidia/oph-cluster/oph-io-server/etc/start_ioserver.sh
# Body
mkdir -p ${HOME}/.ophidia
> ${HOME}/.ophidia/${serverid}${taskid}.start.sh
echo "#!/bin/bash" >> ${HOME}/.ophidia/${serverid}${taskid}.start.sh
echo "${IO_SERVER_LAUNCHER} ${hostpartition}" >> ${HOME}/.ophidia/${serverid}${taskid}.start.sh
chmod +x ${HOME}/.ophidia/${serverid}${taskid}.start.sh
${LAUNCHER} --mpi=pmi2 --input=none --exclusive --ntasks-per-node=1 -N ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} ${HOME}/.ophidia/${serverid}${taskid}.start.sh
if [ $? -ne 0 ]; then
echo "Unable to submit ${HOME}/.ophidia/${serverid}${taskid}.start.sh"
exit -1
fi
rm ${HOME}/.ophidia/${serverid}${taskid}.start.sh
exit 0
Here it is assumed that $HOME folder of the submitter is shared among all the nodes of the cluster (e.g. a shared file system is adopted) so that the executable file ${HOME}/.ophidia/${serverid}${taskid}.start.sh can be accessed by any nodes. Then, MPI processes started by Slurm are associated with the same program ${HOME}/.ophidia/${serverid}${taskid}.start.sh.
Note
In case the $HOME folder is not shared, you should specify a different shared folder where the .ophidia directory and the scripts will be created.
The use of options –exclusive and –ntasks-per-node=1 allow to start only one MPI process for each node so that the number of I/O server instances is equal to ncores.
Additional details:
- fixString is an optional prefix used to recognize Ophidia tasks; set it to an unusual value for job names (e.g. a random string); set it to the same value for all the scripts of this section
- LAUNCHER points to the command used to submit tasks to resource manager: srun is related to Slurm; change the value according to the proper resource manager
- IO_SERVER_LAUNCHER points to the actual script used to activate I/O server instances (see next sub-section below for further details)
- actual task name from resource manager point-of-view is ${fixString}${serverid}${taskid}
- a hidden folder named .ophidia is created to avoid to replace possible homonym files in HOME folder
The following version of the script is adequate for LSF, provided that environmental variables are set accordingly.
#!/bin/bash
export LSF_SERVERDIR=...
export LSF_LIBDIR=...
export LSF_BINDIR=...
export LSF_ENVDIR=...
export PATH=...
export LD_LIBRARY_PATH=...
# Input arguments
taskid=${1}
ncores=${2}
log=${3}
hostpartition=${4}
queue=${5}
serverid=${6}
# Const
fixString=randomString
LAUNCHER=<path-to-bsub>/bsub
IO_SERVER_LAUNCHER=/usr/local/ophidia/oph-cluster/oph-io-server/etc/start_ioserver.sh
# Body
echo "${IO_SERVER_LAUNCHER} ${hostpartition}" >> $HOME/.ophidia/${serverid}${taskid}.submit.sh
chmod 711 ${HOME}/.ophidia/${serverid}${taskid}.submit.sh
{LAUNCHER} -q ${queue} -x -R "span[ptile=1]" -I -n ${ncores} -o ${log} -e ${log} -J ${fixString}${serverid}${taskid} -Ep "rm ${HOME}/.ophidia/${serverid}${taskid}.submit.sh" blaunch ${HOME}/.ophidia/${taskid}.submit.sh
exit 0
$prefix/etc/script/start_ioserver.sh
The script runs over each node where a new istance of I/O server needs to be activated. It is called by $prefix/etc/script/oph_start.sh and executed as MPI process. Input argument is the numeric identifier of host partition associated with the dynamic cluster.
The script consists of three parts:
- host reservation in OphidiaDB
- actual deploy (execution of an instance of I/O server)
- host release in OphidiaDB
The first step is needed to update host status in OphidiaDB: after this update the node where the script is executed is considered “reserved” and it cannot be assigned to other user-defined partition (or other clusters as well). As a consequence only one instance of I/O server will run over the node.
The second step is the actual execution of I/O server instance. The script is blocked on this execution until I/O server is shut down.
The third step provided to reset host status in OphidiaDB: after this update the node where the script is executed is released and can be used for other dynamic clusters.
The following version of the script is adequate for any resource manager.
#!/bin/bash
# Input parameters
hpid=${1}
# Const
OPHDB_NAME=ophidiadb
OPHDB_HOST=127.0.0.1
OPHDB_PORT=3306
OPHDB_LOGIN=root
OPHDB_PWD=abcd
IO_SERVER_PATH=/usr/local/ophidia/oph-cluster/oph-io-server/bin/oph_io_server
IO_SERVER_TEMPLATE=/usr/local/ophidia/oph-cluster/oph-io-server/etc/oph_ioserver.conf.template
# Body
hostlist=`hostname --all-fqdns`
myhost=`echo ${hostlist} | tail -c 7`
myid=`echo ${hostlist} | tail -c 2 | bc`
echo "Add host ${myhost} to partition ${hpid}"
mysql -u ${OPHDB_LOGIN} -p${OPHDB_PWD} -h ${OPHDB_HOST} -P ${OPHDB_PORT} ${OPHDB_NAME} -e "START TRANSACTION; UPDATE host SET status = 'up' WHERE hostname = '${myhost}'; INSERT INTO hashost(idhostpartition, idhost) VALUES (${hpid}, (SELECT idhost FROM host WHERE hostname = '${myhost}')); COMMIT;"
if [ $? -ne 0 ]; then
echo "Query failed"
exit -1
fi
echo "OphidiaDB updated"
rm -rf ${HOME}/.ophidia/data${myid}/*
mkdir -p ${HOME}/.ophidia/data${myid}/{var,log}
cp -f ${IO_SERVER_TEMPLATE} ${HOME}/.ophidia/data${myid}/oph_ioserver.conf
sed -i "s|\$HOME|${HOME}|g" ${HOME}/.ophidia/data${myid}/oph_ioserver.conf
echo "Starting I/O server ${myid}"
${IO_SERVER_PATH} -i ${myid} -c ${HOME}/.ophidia/data${myid}/oph_ioserver.conf > ${HOME}/.ophidia/data${myid}/log/server.log 2>&1 < /dev/null
echo "Exit from IO server ${myid}"
echo "Remove host ${myhost} from partition ${hpid}"
mysql -u ${OPHDB_LOGIN} -p${OPHDB_PWD} -h ${OPHDB_HOST} -P ${OPHDB_PORT} ${OPHDB_NAME} -e "START TRANSACTION; UPDATE host SET status = 'down', importcount = 0 WHERE hostname='${myhost}'; DELETE FROM hashost WHERE idhostpartition = ${hpid} AND idhost IN (SELECT idhost FROM host WHERE hostname = '${myhost}'); COMMIT;"
if [ $? -ne 0 ]; then
echo "Query failed"
exit -1
fi
echo "OphidiaDB updated"
rm -rf ${HOME}/.ophidia/data${myid}/*
exit 0
Of course, access parameters to OphidiaDB need to be updated as reported in server configuration for $prefix/etc/ophidiadb.conf.
Note that lines used for retrieving the hostname and/or the host identifier (myhost and myid) must be adapted according to the specific HPC environment. In the example provided it is assumed that hostnames are
hence the the variables myhost and myid are set as follows
*myhost* *myid*
oph-01 1
oph-02 2
oph-03 3
...
Before running the I/O server instance, the script builds a working area in $HOME folder of the submitter (Linux user). In the example proposed above, it corresponds to folder ${HOME}/.ophidia/data${myid} and includes:
- I/O server configuration file
- two folders, named var and log, used by I/O server to save internal metadata.
The former is a copy of IO_SERVER_TEMPLATE, which is customized for the specific submitter at run-time. In particular, the template (shown below) includes the list of configuration parameters for all the possible I/O server instances (one for each analytics node) as reported in the multiple-instance configuration. Contrary to a static configuration of I/O server instances, the parameter SERVER_DIR has to be automatically set to a sub-folder of $HOME folder of the specific submitter at run-time, thus avoiding to write metadata in a folder accessible by any other user. In the proposed example, the parameter is set to $HOME/.ophidia/data1 for the first instance, to $HOME/.ophidia/data2 for the second instance and so on. The script start_ioserver.sh replaces any occurrence if the string “$HOME” found in the copy of IO_SERVER_TEMPLATE with the path to $HOME folder of the submitter and, hence, metadata will be saved in a folder accessible only by the submitter.
[instance1]
SERVER_HOSTNAME=192.168.0.1
SERVER_PORT=65000
SERVER_DIR=$HOME/.ophidia/data1
MAX_PACKET_LEN=4000000
CLIENT_TTL=300
MEMORY_BUFFER=1024
[instance2]
SERVER_HOSTNAME=192.168.0.2
SERVER_PORT=65000
SERVER_DIR=$HOME/.ophidia/data2
MAX_PACKET_LEN=4000000
CLIENT_TTL=300
MEMORY_BUFFER=1024
...
Additional details:
- OPHDB_NAME is the name of OphidiaDB at MySQL management system
- OPHDB_HOST is the hostname or IP address of the node where MySQL management system is running
- OPHDB_PORT is the port number of MySQL management system
- OPHDB_LOGIN is the username used to access OphidiaDB
- OPHDB_PWD is the password used to access OphidiaDB
- IO_SERVER_PATH points to Ophidia I/O Server executable
- a hidden folder named .ophidia is created to avoid to replace possible homonym files in $HOME folder
- host partition identified by ${hpid} is created before running the script by OPH_CLUSTER.
Retrieve the number of available hosts (to deploy a new cluster)
The command adopted to retrieve the number of available hosts is set in the parameter SUBM_CMD_TO_COUNT and, by default, it corresponds to the script $prefix/etc/script/oph_count.sh shown below.
#!/bin/bash
# Input parameters
WORK_FILE=${1}
# Const
OPHDB_NAME=ophidiadb
OPHDB_HOST=127.0.0.1
OPHDB_PORT=3306
OPHDB_LOGIN=root
OPHDB_PWD=abcd
# Body
COUNT=`mysql -u ${OPHDB_LOGIN} -p${OPHDB_PWD} -h ${OPHDB_HOST} -P ${OPHDB_PORT} ${OPHDB_NAME} -s -N -e "SELECT COUNT(*) FROM host WHERE status = 'down' AND idhost NOT IN (SELECT idhost FROM hashost);" 2> ${WORK_FILE}`
ERROR=`wc -l < ${WORK_FILE}`
if [ $ERROR -gt 1 ]; then
echo "Query failed"
exit -1
fi
echo $COUNT > ${WORK_FILE}
exit 0
This script simply retrieves the number of hosts reserved for Ophidia dynamic cluster by sending a query to OphidiaDB. In an HPC environment the resource manager could also be used to submit tasks different from Ophidia operators: in this case the actual number of available host is lower than the number of available hosts reported in OphidiaDB; hence the script has to be adapted in order to retrieve the real number in a different way.
For instance, considering an environment based on LSF where host names are
The following command
bhosts | awk '$1 ~ /oph-[0-9]/ && $2 == "ok" && $5 == "0" { print $2 }' | wc -l
returns the actual number of available hosts.