Workflows: Basic usage

The user can define a workflow by a JSON object that includes: the task list as a JSON array, some information regarding possible task dependencies and a set of key-value pairs as additional global attributes shared between all the tasks.

The first example

The following JSON object is an example of workflow with 3 tasks: two independent tasks and one dependent task on the others.

{
        "name": "Example1",
        "author": "Foo",
        "abstract": "Simple workflow with three tasks",
        "exec_mode": "sync",
        "ncores": "1",
        "cube": "http://hostname/1/1",
        "tasks":
        [
                {
                        "name": "Extract maximum value",
                        "operator": "oph_reduce",
                        "arguments": [ "operation=max" ]
                },
                {
                        "name": "Extract minimum value",
                        "operator": "oph_reduce",
                        "arguments": [ "operation=min" ]
                },
                {
                        "name": "Evaluate max-min range",
                        "operator": "oph_intercube",
                        "arguments": [ "operation=sub" ],
                        "dependencies":
                        [
                                { "task": "Extract maximum value", "type": "single", "argument": "cube" },
                                { "task": "Extract minimum value", "type": "single", "argument": "cube2" }
                        ]
                }
        ]
}

Each task is uniquely identified within the workflow by its name and is related to a specific Ophidia Operator set as operator. According to that operator, the user can optionally insert a JSON array of key-value pairs (arguments) in order to call the operator with appropriate arguments.

When the above request has be served, the first two tasks (extraction of the maximum and minimum values over implicit dimensions of input cube http://hostname/1/1) are started back-to-back and executed in parallel. Then, Ophidia evaluates the difference among the results as soon as both the data reductions are completed. Of course, the third task can be started only after the other tasks are completed, hence it depends on the others.

The dependencies can be specified by a JSON array put in the section dependencies of the dependent task (child). Each item of this array is a JSON object related to a specific parent task which the child depends on. In general, it reports: the name of parent task, the type and the input-output arguments relationship.

In the above example the output cubes of parent tasks are set as input of the third task: the parameter cube of operator OPH_INTERCUBE is set to the PID of the cube that contains the maximum values, whereas the parameter cube2 is set to the the PID of the cube with the minimum values. Dependency type is single since one (and only one) cube is transferred from a parent task to the child. See below for additional information about dependency types.

Global attributes

The global attributes include a number of metadata and the default values of parameters common to all the tasks. Some of these keywords, such as title (name), author’s name (author), a short description (abstract), are mandatory, whereas other variables are optional.

The parameter exec_mode specifies the execution mode, synchronous or asynchronous, and it refers to the entire workflow, not to single tasks. In case of synchronous mode the workflow will be executed in a blocking way, so the submitter will have to wait until it will be finished to display the results. If the execution mode is asynchronous, the workflow will be processed in a non-blocking way (like a batch mode), allowing the submitter to immediately take the control and eventually submit other requests. After sending an asynchronous request, the user can get workflow output (if available) by exploiting the command view.

Since a lot of tasks could be launched in parallel, an important parameter is the number of cores per task (ncores) that specifies the default value to be associated with all the workflow’s tasks. This value can be overridden with another one tailored to a task with a different behaviour.

Workflow submission

To submit a workflow within Ophidia Terminal the user has to type the path to the file that contains the JSON Request, browsing the local file system. These are two examples:

/path/to/workflow.json [param1 [param2 [param3...]]]

./workflow.json [param1 [param2 [param3 ...]]]

In these examples workflow.json is the JSON Request file name, whereas param1, param2, param3... are optional parameters of the workflow. These space-separated strings are automatically substituted by Ophidia Terminal to the parameters $1, $2, $3... in the JSON objects before sending the request to Ophidia Server without changing the original file. For example, consider the following workflow:

{
        "name": "Example2",
        "author": "Foo",
        "abstract": "Simple workflow with two parameters: subset filter '$1' and data reduction operation '$2'",
        "exec_mode": "sync",
        "ncores": "1",
        "cube": "http://hostname/1/1",
        "tasks":
        [
                {
                        "name": "Data selection",
                        "operator": "oph_subset",
                        "arguments": [ "subset_dims=time", "subset_filter=$1" ]
                },
                {
                        "name": "Data reduction",
                        "operator": "oph_reduce",
                        "arguments": [ "operation=$2" ],
                        "dependencies":
                        [
                                { "task": "Data selection", "type": "single" }
                        ]
                }
        ]
}

It has two parameters: the subsetting string used to select data from input cube and the data reduction operation to be applied on this data. To evaluates the average of the first 10 elements of each time series, this workflow shall be submitted in this way

/path/to/workflow.json 1:10 avg

and the workflow will be automatically converted by Ophidia Terminal as follows

{
        "name": "Example3",
        "author": "Foo",
        "abstract": "Simple workflow with two parameters: subset filter '1:10' and data reduction operation 'avg'",
        "exec_mode": "sync",
        "ncores": "1",
        "cube": "http://hostname/1/1",
        "command": "/path/to/workflow.json 1:10 avg",
        "tasks":
        [
                {
                        "name": "Data selection",
                        "operator": "oph_subset",
                        "arguments": [ "subset_dims=time", "subset_filter=1:10" ]
                },
                {
                        "name": "Data reduction",
                        "operator": "oph_reduce",
                        "arguments": [ "operation=avg" ],
                        "dependencies":
                        [
                                { "task": "Data selection", "type": "single" }
                        ]
                }
        ]
}

and sent to Ophidia Server without changing the original file. Therefore, the user can execute other reductions by submitting the same workflow and setting the appropriate values for the parameters. In case of the evaluation of maximum value of terms in odd positions, the submission string will be

/path/to/workflow.json 1:2:end max

Note that parameter substitution can be applied even on global attributes, task names, dependencies, etc.

Before sending a workflow Ophidia Terminal checks automatically its validity. A workflow a valid only if

  1. mandatory global attributes are set
  2. the name of each task is unique
  3. operator names are set correctly
  4. the names in section dependencies are actual task names
  5. the task graph is a Direct Acyclic Graph (DAG)

An invalid workflow will not be submitted.

The user can manually check a workflow by typing

check /path/to/workflow.json

In case the workflow is valid, the terminal builds a SVG image that displays a DAG with tasks and related dependencies. The following image shown the task graph associated to workflow Example1.

Example of task graph associated to a workflow

Workflow output

In general the output of a workflow is a report with the final status of each task. It is coded within a JSON file (JSON Response) whose schema is shown here.

The output consists of several objects. By default three objects are included:

  1. a text object Workflow Status, which reports workflow status;
  2. a table object Workflow Progress, which reports the total number of tasks and the number of completed tasks; this table is useful in case of errors;
  3. a table object Workflow Task List, which reports a list with some information about each task: job identifier, Marker ID, Task Name, Task Type (simple or massive) and Task Status.

Job identifier is an URL that can be used to access session web resources related to the task. See session section for additional information on these web resources.

The third object, which also includes the Workflow ID and a Parent Marker ID associated with the workflow and used to refer to workflow output, can be represented by Ophidia Terminal as a graph like the following image.

Example of task graph associated to a completed workflow

In this graph, the colour of each disk represents the status of the related task according to the following legend:

  1. a grey disk represents an unscheduled task or an unselected task (see Selection interface)
  2. a pink disk represents a pending task
  3. a orange disk represents a running task
  4. a cyan disk represents a waiting task (see Interactive workflows)
  5. a green disk represents a completed task
  6. a yellow disk represents a skipped task (failed but ignored)
  7. a red disk represents a failed task

Some fields of Workflow Task List such as Workflow ID or Parent Marker ID are repeated for each job even though its values is constant. To reduce verbosity and improve readability, consider to adopt another output_format by setting this global variable to compact as follows:

{
        "name": "Example4",
        "author": "Foo",
        "abstract": "...",
        "output_format": "compact",
        "..."
}

In this case workflow output includes an additional table object Workflow Basic Information that reports job identifier, Workflow ID and Parent Marker ID, which are not present in Workflow Task List.

The command view returns the current status of a workflow using the same format above. Assuming that Workflow ID is 10, the following images show two intermediate snapshots of the status of workflow Example1, obtained by calling the command

view 10

during the execution. Of course, the argument of the command is the value of Workflow ID. The former depicts the status during the execution of data reduction, whereas the latter shows the status during the execution of the third task.

Example of task graph associated to a running workflow
Example of task graph associated to a running workflow

The following image shows the graph of a failed workflow for completeness.

Example of task graph associated to a failed workflow

Note that, in these cases, the object Workflow Progress can be used to evaluate the stage of workflow completion, i.e. the ratio of the number of completed tasks to the number of total tasks. The use of command view is particularly useful to get this information, expecially for workflows submitted in asynchronous mode.

The user can obtain the output of each task by exploiting the command view again and referring the task by its Marker ID. As an example, the following command

view 10#20

returns the output of the task identified by Marker ID 20 of the workflow identified by Workflow ID 10.