• An effective response to CoViD19 requires a privacy-aware distributed data-sharing infrastructure

post by the SNE COVID-19 Response Team

The public health crisis due to the spread of coronavirus is pushing governments all over the world towards exceptional solutions. Often we hear the argument that the coronavirus is forcing a trade-off between privacy and public-health. We believe this to be a statement that has a weak bases from a technological point of view. We are convinced that it is possible to develop technological solutions that are not detrimental to privacy and to other fundamental human rights.

Several development and research initiatives started and became public in the last two weeks, and we expect the rhythm of new proposals will increase exponentially in the near future. In this frame, it is important to set principles that can help to evaluate whether a solution is sustainable, and what is still missing/need to be considered.

At SNE, the Systems and Networking Lab research cluster of the University of Amsterdam, we have specific expertise in the conception and implementation of data-sharing infrastructures, and we run dedicated research efforts on their regulatory aspects together with several academic and private partners in diverse economic domains. We are for instance leading the DL4LD project (about dynamic polices in the logistics sector, with use cases related to emergency situations), the VWDATA and EPI projects (about privacy and patient empowerment in healthcare research), and SSPDDP (about data-sovereignty issues in competitive socio-economic settings as financial services). We are currently investigating the specific technological challenges related to the CoViD19 response, tracking the solutions which are currently proposed around the world and drafting a plan of action. We strive for a technological solution that is privacy-preserving, user-empowering and effective for health protection.

At the moment our attention focuses in particular on privacy-aware contact tracing. Contact tracing is the problem of identifying potential positives consequently to a person being declared positive. (Other applications are more problematic. For instance, planning trips/navigation has raised concerns of producing social stigma-like behaviour; self-isolation monitoring would support enforcement and policing but might be detrimental to the acceptance/use rate of the app, and also could result in avoidance behaviour.)

Contact tracing requires the collection of data to calculate the proximity of two persons. Evidently, knowing where a person is over time and with whom he is meeting is personal, sensitive information hence the different methods of collecting and processing this data will have a drastic impact on a person’s privacy. 

Most existing solutions for app-based contract tracing assume a client-server relationship: an individual app that provides location data (with, and in some cases without consent) to a central server maintained by a certain authority. This implies that data subjects have no or low control over their data and its use, even more because it is copied on other premises for processing. Second, the centralization assumes a monolithic social control structure, and it is so in conflict to principles of balance of powers.

The solution that we are envisioning starts from opposite grounds. Technically we are studying a combination of peer-to-peer (p2p) and cloud approaches with privacy-preserving aggregated processing. Rather then pushing, we want users pulling information from the infrastructure. We are then considering three requirements necessary for data-subject empowerment:

  • data sovereignty: personal data resides only on local premises, and thus will not be directly shared, if not in encrypted or other privacy-preserving processing form. This brings primacy to p2p processing: any computational treatment is conducted as much as possible amongst peers.
  • the data-sharing infrastructure is governed by dynamic policies, including the user’s consent, which can be modified and operationalized in realtime.
  • To guarantee feasibility, some aggregated indexing is necessary to select relevant groups of devices that should run the test amongst all devices in the infrastructure. The aggregated indexed data will reside on servers selected by the user according to decentralized authority principles.

The last point is particularly innovative: instead of being associated to a single server (standard solution at the moment), users can decide to which server be indexed. Such an infrastructure, just like the internet, is scalable at global level and adaptive to the trust fabric of the specific communities/countries in which users reside. Servers might for instance be maintained by universities, by hospitals, by GPs, by governments or non-governmental agencies.

We are also working on the idea of enabling a scenario-based application, going beyond a simple position matching. Our model is to decouple the person from the virus. Recent studies on coronavirus have shown that the virus might be persistent in closed spaces and on certain type of surfaces. The movement of a positive person might have a different impact depending on the places she passed through, e.g. if that person stayed in a bus for a certain time, there might be possibility of contagion even if there is no contact. Clearly, we have not the expertise to specify such scenarios, but we are working to make usable these types of inputs, given for instance by epidemiology experts or clinical studies.

To face the CoViD19 emergency, we cannot be satisfied with ad-hoc solutions, assuming top-down power structures and reproducing them in the information-processing infrastructure. On the contrary, as what we need is a sustained, collective response, there is no other means to increase acceptance than adequately empowering users and intermediate bodies.