Inherent privacy limitations of decentralized automatic contact tracing system

Yoshua Bengio, Daphne Ippolito, Richard Janda, Max Jarvie, Benjamin Prud'homme, Jean-Francois Rousseau, Abhinav Sharma, Yun William Yu

Recently, there have been many proposals to use mobile apps as an aid in contact tracing to control the spread of the SARS-CoV-2 (COVID-19) pandemic. However, although many apps aim to protect individual privacy, the very nature of contact tracing must reveal some otherwise protected personal information. There are endemic privacy risks that cannot be removed by technological means, and which may require legal or economic solutions. In this letter, we discuss a few of these inherent privacy limitations of any decentralized automatic contact tracing system.

The advent of the COVID-19 pandemic has seen widespread interest in the potential utility of automatic tracing apps [Ferretti et al., Science, 2020] and concern over their potential negative effects on individual privacy [Sharma et al., Nature Medicine, 2020]. Individual privacy is broadly recognized as important for different reasons and by different constituencies. For some, it is an end goal in and of itself; others regard it as a fundamental desideratum for democratic institutions and for the proper functioning of civil society. It is also widely recognized, however, that many social institutions, public and private services, and other systems beneficial to individuals, democratic institutions, and civil society cannot function without some degree of access to personal information. To satisfy these competing objectives, legal frameworks have been deployed in many jurisdictions to set ground rules for how personal information is to be handled (e.g., the GDPR in Europe, HIPAA (for personal health information) in the U.S., and PIPEDA and similar laws in Canada). One aim that is consistent across these various regimes is the emphasis on adequate security safeguards: when organizations and institutions collect, use and disclose personal information, the systems put in place to facilitate these activities should minimize the potential for unauthorized access, thereby minimizing the possibility of unintended use [Ienca et al., Nature Medicine, 2020].

For automatic tracing apps that are focused on individual privacy, fulfilling these obligations is a basic first step. Automatic contact tracing apps generally depend on both sides of the contact (diagnosed and exposed persons) having the app installed, therefore user adoption is critical for the contact tracing to work [Hinch et al., 2020]. In countries where installation of such apps is voluntary, users may choose not to install an app if it leaks too much personal information [Simko et al., arXiv, 2020]. Enforceable privacy and data protection laws can provide some level of assurance to individual users that their personal information will not be unduly exposed.

Yet compliance with such laws is still only a first step, for in the context of automatic tracing apps intended for general deployment across large populations of individuals, it is not clear that simple adherence will be sufficient. As such, many such apps have developed privacy protocols that aim to decentralize processing, storage and system control, and more generally decrease the amount of trust users need to invest in the system.

Even with such controls, there remain residual privacy risks to all decentralized contact tracing systems. We believe it is of paramount importance to acknowledge and analyze these inherent risks, allowing both end-users and policy makers to make voluntary and informed decisions on the privacy trade-offs they are willing to tolerate for the purposes of fighting the COVID-19 pandemic. For end-users in particular, where the legal basis for using personal information in contact tracing is founded on consent, it is important that information about inherent risks is made generally available in order to support the meaningfulness of the consent obtained.

Let us consider the most basic properties that any automatic decentralized contact tracing app must have: (1) when two phones are within a few metres of each other, a “contact” is recorded, and (2) when a user (Bob) has a change in COVID-19 status, all of their contacts (who we will refer to as Alice) for the past 14 days are notified of that exposure and the day it happened. In practice, apps may use some combination of the GPS, Bluetooth, and ultrasound to achieve these aims. The privacy leakages we describe here are inherent to the technology used.

The inherent privacy leakages arise because, implicitly, Bob is sending information about his COVID-19 infection status to contacts based on collocation. While many apps do not directly use location information, contacts are still determined by Bob being in close proximity with another user, so the very existence of a contact event reveals some small amount of location information. An attacker who has sufficient information about or control over Bob's location history can perform a linkage attack (i.e., linking together external information with the messages Bob sends) to learn Bob's estimated infection status. Alternatively, sending notifications to Bob's contacts may also reveal information about Bob's location history if those notifications are too specific to him. An extreme example would be if he is the only individual testing positive for COVID-19 in a region.

Businesses that have access to any part of Bob's location history can gain access to his diagnostic status by placing a contact tracing device in his path. One concrete example of such a business is a hotel. In the simplest version of the attack, the hotel places a phone running the contact tracing app in every hotel room. If Bob stays in Room 314 on June 1 and later sends his COVID-19 status, then the phone in Room 314 will receive that message. Because the hotel knows the guest register, they are trivially able to link that message to Bob, breaching his medical privacy.=

This simple version of the attack can be thwarted by not allowing the hotel 1000 phones, perhaps by validating every single copy of the app to be associated with a real person or a real phone number. However, that does not block a more sophisticated binary-search version of the attack. Suppose the hotel has 1000 rooms and only 10 phones running the app (e.g., they have 10 employees running the app). Then, at night when all the guests are in bed, each employee walks past half of their doors, only turning on their phone at specific doors for 15 minutes, creating a 10-bit code for each room. If employees 1, 3, and 5 walked past Room 314, then the code would be 1010100000. Since a 10-bit code has 210 or 1024 possibilities, every room can get a unique code. Later, if only employees 1, 3, and 5 receive messages pertaining to June 1, the hotel can conclude the message was from Bob. This may seem logistically challenging to coordinate, but it can be simulated synthetically with a hacked device in each room. These devices are no longer running the app as normal, so they might be considered illegal, but because they are simulating the behaviour of a real person walking past rooms in a weird pattern, they cannot be technologically prevented. All a hotel needs is access to 10 accounts.

Although we have described this attack in the context of a hotel and a fixed location, this style of attack allows any malicious vigilante, let's call her Mallory, to determine when and where they were exposed. There are 720 two-minute time intervals in a day. Since this is fewer than 1024, Mallory can assign a 10-bit label to each of those two-minute periods, much like the hotel assigned a 10-bit label to each room, to determine exactly when she was exposed. If Mallory knows who she was in close proximity to during that two-minute period, she may be able to reveal Bob's COVID-19 status.

In practice, many proposed contact tracing protocols do not require using multiple identities because they do not require user validation when users are attempting to determine their own exposure status. For example, in several of the decentralized proposals, all of the contact matching happens locally on the phone. This is extremely powerful for protecting the user privacy of non-diagnosed users, as those users do not transmit any information off their phones, but it also means that there is no straight-forward way to prevent an attacker from locally matching on multiple phones.

Suppose that Mallory wants to gain Bob's location information, rather than his COVID-19 medical status. When Bob sends his COVID-19 status to contacts, Mallory receives a notification for every one of her encounters with Bob, because there's no way for Bob to know that he crossed paths with Mallory multiple times. If Mallory receives all her notifications from Bob around the same time, Mallory might be able to infer that all of her exposure notifications were likely for the same person, giving her a partial record of Bob's movements. Of course, this can be made more difficult by not having Bob send all the notifications at once, but if exposure notifications are rare (e.g., Bob is the only Covid-19-positive individual in a city), Mallory might still gain partial information.

The danger of the location history tracking is heightened if the adversary is a large institution, which we will call Grace. If Grace deploys phones around a city, she might be able to correlate together location histories of many diagnosed individuals. The reason Grace is able to do this despite receiving notifications from many individuals simultaneously is that she can sometimes acquire spatially and temporally contiguous messages. If Bob was on Main and 1st Avenue, walking to Main and 2nd Avenue, and Grace has phones at both intersections receiving Bob's notifications, she can infer that Bob sent both messages, unless sufficient temporal noise is added in the moment of sending risk messages to past contacts. This information is comparable to that achievable through CCTV recording and face recognition, but is of some concern because phone-like devices can be deployed more surreptitiously than cameras.

On the other hand, while we have discussed some of the inherent privacy limitations of decentralized automatic contact tracing, we should keep in mind that there are also basic privacy leakages from traditional contact tracing. Bob's location history is exposed when a human contact tracer asks Bob where he has been. Manual contact tracing can also reveal a lot more personal information about exposed contacts (such as their names, phone numbers and history of contact locations with Bob). With a decentralized contact tracing app, Alice's privacy is better protected because only she learns about her exposure. Of course, with manual contact tracing, Bob can selectively adjust his memory and not mention the places he would rather the authorities not know he has been, but similar privacy for Bob can be achieved if he simply turns off the app or phone.

In conclusion, automatic contact tracing holds the potential for greatly assisting in the fight against COVID-19. However, even with the best-designed systems, there are inherent limitations in how private a system can be technologically made because identifying contacts' COVID-19 status is the entire point of contact tracing. A privacy maximalist would rightly consider these attacks to be a reason to not use any decentralized automated contact tracing system. However, even privacy pragmatists may be concerned about the trade-off of revealing sensitive medical information such as COVID-19 status to businesses they frequent and strangers they encounter.

Because technological solutions can only go so far, resolving the impact of many of these attacks is thus a matter of policy and law. To the extent that existing laws are not robust enough to address the automatic tracing app context, augmentations to existing legal frameworks may help to protect user privacy against legitimate central authorities, such as public health agencies, and deter private sector organizations, such as hotels, that might be tempted to leverage such attacks. Another potential mitigation is to change the economic incentive structure for legitimate actors. If a public health app deliberately provides partial hot spot information to a hotel, properly de-identified and spatially coarsened, that may be sufficiently useful to the hotel. Coupled with legal restrictions, this could defend against businesses attempting to re-identify individuals.

Regardless, we believe that it is essential that designers and purveyors of contact tracing apps are transparent with the types of privacy guarantees they can offer. The authors of this correspondence are ourselves involved in designing a decentralized automated contact tracing app [Aldsurf et al., arXiv, 2020] and this letter is not an analysis of the trade-offs necessary for the real system we are designing. However, we hope that this letter is helpful in clarifying the baseline privacy trade-offs that decentralized automatic contact tracing systems are asking users to make. It is only with informed consent and transparency that automatic contact tracing efforts will be successful in helping fight the COVID-19 pandemic.

  1. Alsdurf H et al. COVI White Paper. arXiv preprint arXiv:2005.08502. 2020 May 18.
  2. Sharma, T., Bashir, M. Use of apps in the COVID-19 response and the loss of privacy protection. Nature Medicine (2020). https://doi.org/10.1038/s41591-020-0928-y
  3. Ferretti L, Wymant C, Kendall M, Zhao L, Nurtay A, Abeler-Dörner L, Parker M, Bonsall D, Fraser C. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science. 2020 May 8;368(6491).
  4. Ienca M, Vayena E. On the responsible use of digital data to tackle the COVID-19 pandemic. Nature Medicine. 2020 Apr. 26(4):463-4.
  5. Hinch R, Probert W, Nurtay A, Kendall M, Wymant C, Hall M, Fraser C. Effective Configurations of a Digital Contact Tracing App: A report to NHSX. en. In:(Apr. 2020).
  6. Simko L, Calo R, Roesner F, Kohno T. COVID-19 Contact Tracing and Privacy: Studying Opinion and Preferences. arXiv preprint arXiv:2005.06056. 2020 May 12.