Today, large data sets stored in the cloud or data lakes have made scalable solutions for governed access control essential. Everyone needs to stay compliant. Our goal is to provide fine-grained governance that can deal with the challenges of GDPR, like consent for different users and users’ personal information used by policies that may be stored in distant storages (e.g., Active Directory), and must be accessed with minimal impact on performance. This is especially difficult when it comes to large scale data.
We created an implementation that includes a combination of pre-processing for policies, profiles, and consent. We use in-line query rewriting to push down the filtering into the data store, thereby leveraging the data store query optimization engine.
We compile the policies and the relevant attributes (e.g., content, profiles, consents) for governance actions into an intermediate representation. This representation includes the governance rules in a way that can be easily applied during an access query. Then, during the query run-time, just before the query is executed by the data store, we perform a simple rewrite of the query to apply governance rules using the intermediate representation. The revised query is then executed by the data store and the result is that only the compliant data is returned as query output.
The full article by Ety Khaitzin, Maya Anderson and Roee Shlomo from IBM Research continues here, with the full details of this approach and with a comparison to other approaches.