The VDC developer is responsible for developing the data processing layer composing the VDC. In particular, starting from the CAF defined by the Data Administrator, which defines which are the data to be exposed to the user, the VDC developer has to write first of all the code needed to retrieve the data from the attached data sources (available through the DAL developed by the Data Administrator). Secondly, using the most suitable programming language that can support the required data analytics (e.g., Scala, Python, Java, Node-red) one or more containers are created to execute this analysis. Some information about the generated software architecture are then embedded in the Abstract VDC Blueprint to automate the deployment when required.

Virtual Data Container(VDC) is, by definition, programming platform and language agnostic, in order to facilitate the life of the application developer, who is in charge of creating the VDC. Indeed, the developer has the flexibility to implement the VDC based on the platform and the language with which he/she is familiar. For instance, in the context of DITAS project – as required by the two uses cases considered in the project – one of the implemented VDC uses the Spark platform and relying on the Spark SQL module for structured data processing, while another one uses the Node-RED platform, whose lightweight runtime is built on Node.js. Moreover, the VDC is architectural agnostic and therefore it is able to run at the edge of the network on low-cost hardware such as the Raspberry Pi as well as in more powerful cloud resources.

This part of the SDK contains all the guidelines in order to run the VDC Image with the deployment engine, as well as how to configure the Spark and Node-RED versions of the VDC.

Guide for VDC Creation and Configuration

In order to tell the Deployment Engine where to take and serve the VDC image, we have to fill the VDC_images section of the blueprint. The template of this section is as follows:

"VDC_Images": {
     "caf": {
          "internalPort": [INTERNAL_PORT_OF_CONTAINER],
          "externalPort": [EXTERNAL_PORT_OF_CONTAINER],
          "image": "IMAGE_NAME:TAG_NAME"
     }
}

The DITAS installation includes a private Docker registry (served on port 5050) which allows to share custom base images within your organization, keeping a consistent, private, and centralized source of truth for the building blocks of the application architecture. It gives better performances for big clusters and high-frequency roll-outs, plus added features like access authentication. If we push the VDC image to this private registry, the configuration will be:

"VDC_Images": {
     "caf": {
          "internalPort": [INTERNAL_PORT_OF_CONTAINER],
          "externalPort": [EXTERNAL_PORT_OF_CONTAINER],
          "image": "[DITAS_SDK_IP]:5050/container-name:tag-name"
     }
}

Using the private repository is strongly encouraged due to the reasons explained above, but if we prefer to use the DockerHub repository it is also possible in DITAS:

"VDC_Images": {
     "caf": {
          "internalPort": [INTERNAL_PORT_OF_CONTAINER],
          "externalPort": [EXTERNAL_PORT_OF_CONTAINER],
          "image": "repository-name/container-name:tag-name"
     }
}

For example, if we are using a Node-RED based VDC (port 1880, as will be explained on the following section), the DITAS SDK is deployed in 198.25.25.6, the VDC image has the name vdc-test-image and the tag is latest, the configuration will be as follows:

"VDC_Images": {
     "caf": {
          "internalPort": 1880,
          "externalPort": 1880,
          "image": "198.25.25.6:5050/vdc-test-image:latest"
     }
}

Guideline for Node-RED and Spark Based VDCs

The Node-RED based VDC is served, as well as the rest DITAS components, as Docker containers, so to start developing the Node-RED flow, we need a docker-based Node-RED installation. We can launch one with the following command:

docker run -it -p 1880:1880 --name mynodered nodered/node-red-docker

This command will download the nodered/node-red-docker container from DockerHub and run an instance of it with the name of mynodered and with port 1880 exposed. In the terminal window you will see Node-RED start. Once started you can then browse to http://{host-ip}:1880 to access the editor and start working with the flow.

Furthermore, we can use a Dockerfile in order to install or export extra elements to the Node-RED VDC. The following code, for example, uses the public Node-RED image as base image and installs the mysql node to use it by default.

FROM nodered/node-red-docker
RUN npm install node-red-node-mysql

The Docker-based Node-RED installation uses the path /data to as the user configuration directory, so if we need to copy the settings file (settings.js) or the flows file (flows.json) from the host to the container, we can use the following command:

FROM nodered/node-red-docker
COPY settings.js /data
COPY flows.json /data

One we have the image ready, we can create the container and push it to the DITAS private docker repository. This repository is included by default on the DITAS installation and serves on port 5050. We can generate the image and push it using the following comands:

docker build -t example-vdc .
docker tag example-vdc [DITAS_SDK_IP]:5050/example-vdc:latest
docker login [DITAS_SDK_IP]:5050
(enter user and pass)
docker push [DITAS_SDK_IP]:5050/example-vdc:latest

If analytics is to be run by the VDC on the data returned by the DAL, then the VDC can be implemented using Spark. The data from the DAL is returned in protobufs using gRPC protocol. So the VDC consumes this data and uses Spark in order to run analytics on the data. The VDC can be implemented in Scala, which has a gRPC client and where SparkSQL code can be easily written using Spark documentation https://spark.apache.org/docs/latest/sql-programming-guide.html. The VDC can either use local Standalone Spark or an existing Spark installation. The VDC can be also implemented in Python, Java, or any other language that has a gRPC client and Spark support and with which the data administrator has the best familiarity. An example of a VDC implemented in Scala can be found in https://github.com/DITAS-Project/ehealth-spark-vdc-with-dal/tree/master/VDC .

If external Spark is to be used, then the options for its deployment are as following:

The details of the chosen Spark environment have to be specified in the blueprint:

"Flow": {
    "platform": "Spark",
      "parameters": {
        "spark_master": "spark://[SPARK_HOST]:7077",
        "spark_app_name": "[SPARK_APP_NAME]",
        "spark_jars": "[Any additional jars to load by spark, if needed]"
      }
}