Modern hospitals aggregate huge amounts of medical data that could help promote medical research years ahead, if available and easily accessible to researchers. Any researcher knows what a difficult challenge it is to get secure access to high-quality data and to be able to run analytics on it in order to prove or reject her hypotheses.

As part of our work on the European project DITAS (grant agreement RIA 731945) together with researchers of the Ospedale S. Raffaele and other partners we investigated the e-Health usecase, in which a hospital aims to share medical information that it collects as part of its daily work with the scientific community around the world. What could be more natural than to put such data in the cloud? However, the hospital requires that the valuable data is encrypted, in order to be able to restrict access to the data only to authorized research facilities. In addition, different parts of the data should be made available to different research facilities or to different personas within those research facilities, without the burden of managing multiple subsets of the same data, and the data format should support efficient analytics on the data.

The solution to this challenge is that after data is cleansed and anonymized on its way to the cloud in order to conform with such regulations as GDPR, HIPAA, the data is persisted in the cloud in parquet [4] files using Parquet Modular Encryption [1], [2], [3] in order to enable secure access and efficient analytics on the encrypted data. See the full article here  by Maya Anderson, Gidon Gershinsky and Ety Khaitzin from IBM Research , where we describe the experiment we performed using Minio [5] object storage for storing the encrypted parquet files, and Hashicorp Vault [6] to store the encryption keys.

[1] https://github.com/apache/parquet-format/blob/encryption/Encryption.md
[2] https://www.slideshare.net/databricks/efficient-spark-analytics-on-encrypted-data-with-gidon-gershinsky
[3] https://dl.acm.org/citation.cfm?id=3211907
[4] https://parquet.apache.org/
[5] https://min.io/
[6] https://www.vaultproject.io/