Modern applications are eager to acquire and consume more and more amounts of data coming from distributed heterogeneous devices and sources. The spread of Edge Computing answers to the needs of these applications by enabling access and usage of data directly near the data source, thus reducing latency and improving security and privacy. However, edge resources are subject to strong limitations in terms of storage space and computational power. According to this, both data and computation have to be managed moving them dynamically between cloud and edge resources.
The DITAS Cloud Platform allows developers to design data-intensive applications, deploy them on a mixed cloud/edge environment and execute the resulting distributed application in an optimal way by exploiting data and computation movement strategies.
Data and computation movement in DITAS
Data movement is a key concept in DITAS, since it defines all the policies and rules for moving data from a storage to another, enabling us to deal with a heterogeneous environment. Moving data implies moving portions of the offered data set from a data storage to another in a different location either in the edge or in the cloud. Although the generic term is data movement, the actions to be considered are: the actual movement (M), which consists of deleting the data from the original data storage and move them to a different one, and the duplication (D), where data are copied from a data storage to another while keeping them in the original one [Plebani et al. 2018]. As there might be differences and constraints on the way in which data can be stored, data movement could also require some data transformation when data are moved or duplicated from a storage to another, i.e., for security/privacy reasons. Examples of transformations include: aggregation, pseudonymization, anonymization, and encryption.
Along with data movement, computation movement can be performed to manage the satisfaction of the application developer’s requirements. The DITAS platform is not in charge of managing the application itself, but only to facilitate its successful access to the needed data, therefore, computation movement is addressed to the computation performed by the platform to appropriately prepare the data requested by the customers. In DITAS, the module in charge of executing the computation is the Virtual Data Container (VDC). Possible tasks performed in the VDC are data retrieval, data integration, data aggregation, and data transformation. The computation movement aims at improving the satisfaction of the customers by moving the VDC or one of its components from one node to another, either in the edge or in the cloud.
Generally speaking, a movement action might define:
- what to move: the object of a movement can be the whole VDC or a specific component of the VDC for computation movement or a whole dataset or a part of it for data movement;
- where to move: the destination node of the movement has to be specified and can be located either in the edge and in the cloud;
- when to move: the triggers for enacting movement actions might be defined.
A representation of general movement actions between edge (E) and cloud (C) resources subject to transformations is shown in Figure 1, where also the costs associated to the action enactment are represented as metadata.
Figure 1: Data movement actions and transformations example in a fog environment
Representing customers’ requirements in DITAS
Movement actions are used to execute distributed application in an optimal way. Optimal means to be able to satisfy the requirements of the DITAS customers in terms of accessing data as they need. When selecting a data source, customers specify both functional and non-functional requirements, the former stating the features of the data they need, the latter stating the desired level of service. In DITAS, non functional requirements are expressed through the concept of Data Utility, view our previous blog post: Data intensive applications in Fog Computing: the role of Data Utility.
To support the modelling and management of customers’ non-functional requirements, we use a goal oriented representation, where goals represent user requirements organised in a hierarchical way [Plebani et al. 2018]. Each goal is associated with one or more metrics used to assess the goal satisfaction.
Figure 2 shows an example of a goal model for Quality of Service non-functional requirements. There is one top goal, Quality of Service, that represents the main objective to be achieved. This goal is AND-decomposed into two sub-goals, meaning that all sub-goals must be achieved in order to achieve the top-goal. Each sub-goal is OR-decomposed into other sub-goals. The OR-decomposition specifies that at least one of the sub-goals must be achieved, in order to consider the top-goal achieved. For example, Reliable Service is OR-decomposed as Service available and Service scalable: in order to offer a reliable service, the data provider must offer a service with a defined level of availability or a defined level of scalability.
Figure 2: Goal model template for QoS requirements
Similar models can be designed for expressing Data Quality and Privacy and Security non-functional requirements.
The goal model in Figure 2 is a general template that can be used as a reference by the customer. When expressing the requirements, the customer selects goals and subgoals of interest from the template and indicates the metrics constraints used to assess the goal satisfaction. An example of goal model instance for QoS non-functional requirements is shown in Figure 3.
Figure 3: Customised goal model template for QoS requirements
The requirements expressed through the goal model are the core of the Service Level Agreement (SLA) between the DITAS platform and the customers, and a monitoring system will collect the metrics needed to verify the goals satisfaction. The violation of a goal in the Goal model means the violation of the SLA with the customer and thus it is the trigger for a movement action.
Triggering movement through customers’ requirements
The goal model only specifies the agreement between the provider and the customer in terms of Data Utility. We extend this model also taking into account the effect of actions over goals [Plebani et al. 2018]. To model the relations between movement actions (or transformations) and goals, we use contribution links. A contribution link specifies that the execution of the action (and transformation) has an impact on the achievement of the goal. Contribution links can have a positive effect (the execution of an action or transformation helps the achievement of a goal) or a negative effect (the execution of an action or transformation hurts the achievement of the linked goal).
Fig. 4 shows an example of the goal model enriched with movement actions. For example, MEC action (move from edge to cloud) is connected with a positive contribution link to Fast data process, since its adoption will improve the metric Response time and, therefore, it will help the achievement of the goal. Similarly, the same action impacts negatively Fast data streaming, since the movement of a data set in the cloud, in this example, will move the data set farther from the sensors that are creating data.
Figure 4: Goal model enrichment for movement actions triggering
The goal model supports the detection of goal violations and the selection of a proper movement action according to the contributions of actions to the goals satisfaction.
[Plebani et al. 2018] Pierluigi Plebani, Mattia Salnitri, and Monica Vitali. Fog Computing and Data as a Service: a Goal-Based Modeling Approach to Enable Effective Data Movements. In Proceedings of 30th International Conference on Advanced Information Systems Engineering (CAiSE). 2018