In Ophidia data are multidimensional and organized as cubes (or datacubes).
A cube consists of a measure and several dimensions: the measure corresponds to the numerical values that can be analyzed over the available dimensions. For instance, a measure could be temperature, a dimension could be time. In this case the cube includes the values of temperature for each time instant in a given period.
Internal storage model is an evolution of the classic star schema adopted for data warehouses: Dimensional Fact Model. Unlike this model, in Ophidia data can be represented as binary arrays and the platform includes the support for array-based data types. Most of primitives process binary arrays and return a binary array as output. In addition, a key mapping related to a set of foreign keys is adopted to save memory space: the fact table includes only a column key on behalf of many columns foreign keys, one for each foreign key to the corresponding dimension table.
Ophidia assumes that dimension set is divided into two parts: explicit dimensions and implicit dimensions. The former are handled similarly to star schema, the latter refer to binary arrays.
In simple terms,
It is worth pointing out that, although Ophidia is able to handle cubes with many implicit dimensions, the platform is optimized to process data with only one implicit dimension (see the primitives). In this case the dimension refers to the variability of data along the binary array.
As a rule of thumb, dimension types (explicit or implicit) should be chosen based on the workflow to be applied on the input datasets: operators working on arrays give the best performance and advanced features are provided for array manipulation.
For instance, if you wish to analyse a number of time series, one for each cell of the spatial domain, a reasonable implicit dimension would be time, whereas explicit dimensions (e.g. latitude and longitude) would represent the spatial domain.
In order to manage large volumes of data and improve scalability, in Ophidia storage system datasets are partitioned and (possibly) distribuited over more analytics nodes.
Data partitioning consists of splitting the central fact table of the model described above into multiple smaller tables (chunks called fragments). In many cases, this clearly enables parallel data processing, in fact the fragments could be assigned to different Analytics Framework nodes and processed concurrently.
Actually, the fragments produced by the partitioning scheme are mapped onto a hierarchical structure allowing the user to optionally define:
How to import a NetCDF file using the partitioning schema: 3 hosts and 5 fragments per databases (15 fragments in total)?
How to print the partitioning schema adopted for a cube?
How to transform the most inner explicit dimension into the most outer implicit dimension?
How to transform the most outer implicit dimension into the most inner explicit dimension?
How to exchange the order of two implicit dimensions (assuming that there are only two implicit dimensions)?
Note that order of explicit dimensions cannot be exchanged.