Deep learning has pervaded the systems which dominate our daily life at an unfathomable rate, with little sign of slowing down. Neural networks are now in our phones, categorising our photos; they're in our e-mail clients, predicting what we intend to say next; soon, they'll be in our cars, chauffeuring us for all our journeys.
While much effort is spent on developing the latest models to achieve ever greater results, this is often done with great insouciance to their security and safety. Protecting a WiFi network or data store is expected; protecting a model, less so.
Neural networks are, like all systems, fallible. As the global volume of data routed through neural networks increases, the bounty models offer to nefarious actors grows with it. It is only a matter of time before sensitive information is extracted from a high-profile model. Whether the cost will be only in reputation to the victim, or whether there will be greater human costs, is anybody’s guess.
Because of privacy concerns, diverse datasets remain difficult to source and utilise in machine learning despite the dramatic growth of personal data streams. In some domains, such as health, legal restrictions on patient privacy limits rights to data access. By developing robust, privacy-preserving machine learning, we could use the totality of the world’s health data to solve some of the biggest problems facing humanity without sacrificing individual freedoms.
As part of Tessella’s AI Accelerator program, Cortex, I spent 3 months researching issues of privacy in machine learning models. In particular, I aimed to empirically shed light on the trade-offs one can expect between model performance and privacy when applying differential privacy to machine learning by training, attacking and defending classifiers of common datasets. This project includes, to the best of my knowledge, the first application of differential privacy to defend against machine learning privacy attacks.
Attacking a model
Before defending the models, it was first necessary to establish that machine learning models and their data are susceptible to exposure. Existing literature points to several diverse ways machine learning models can leak information. For example, if the training data for a text autocompletion model contains, for example, “My PIN is 1234”, prompting the model with “My PIN is” will eagerly suggest the sensitive data “1234”. Another technique “paints” data to allow identification of a model’s training dataset after-the-fact. Attacks exist which target a model before, during and after training, in solo and collaborative environments, to extract training data, inference data and even the model architecture. In this project, we applied two types of attack, model inversion and membership inference.
Model inversion attacks attempt to recreate data using some information output by the model. The consequences of a successful attack could be extraordinarily damaging. For example, consider a well-intentioned (hypothetical) engineer who worked tirelessly for weeks to develop a model for rapid identification of COVID-19, only for it to inadvertently leak the health information of anybody who used it. Not only would public trust in the system diminish rapidly, leading people to avoid a tool which may be vital to protect public health, there is a possibility that the researcher would be open for legal challenge.
Membership inference determines if some data was part of a model’s training dataset. Using this attack, a business could probe a competitor’s model to glean information about what data they hold. This would be particularly damaging in the pharmaceutical industry, in which information on the types of drugs receiving investment and research focus is extremely secretive and valuable.
Extracting sensitive data
During model training, data seen by a model imprints a sort of pathway through the model. It has previously been demonstrated that these pathways can be exploited to recreate actual training data seen by the model (Fig. 1).
Fig. 1. Left: Data recreated with a Fredrikson model inversion attack. Right: actual training data.
We applied this technique in Cortex to a more complex dataset of images and far deeper model, however the recreated images appeared little more than random noise. The issue is that complex models are able to distinguish very simple structures: a barely perceivable (to a human) outline of an object with four legs and a tail is so evidently a horse (to the model), rather than a frog or a truck or whatever else. In practice, it is far easier to converge on one of these noisy structures than a complex, realistic image.
Generative Adversarial Networks (GANs) are a class of neural networks which can generate realistic images. We applied a GAN to our image generation process to better guarantee that we converge on a usable image. In Fig. 2 you can see an example of a “dog” created using this attack and the most similar image in the training dataset. Clearly, far from generating a generic dog-like object, the attack has exposed an actual piece of data used to train the model.
Fig. 2. Left: data recreated using a GAN attack on a CIFAR-10 classifier. Right: closest training image.
To be clear, this technique is not specific to images. In fact, it is the complexity of images which make this attack so difficult. For complex data, with a sufficiently powerful generative model we could recreate audio captured by home assistants or video used by action recognition models.
With the susceptibility of machine learning models and their data established, we investigated the application of privacy-preserving techniques as defensive mechanism against the attacks. The primary focus of this research was differential privacy, a mathematical formulation for privacy which aims to make model outputs independent of the data used to generate them. In the context of machine learning, this means that having, or not having, data from any one individual in a dataset would not change the output of the model. Differential privacy literature has, to the best of my knowledge, not directly explored the impact on privacy attacks in machine learning, however it is widely accepted that differential privacy should protect data in many attacks.
Differential privacy can be applied directly to the training process (differentially private stochastic gradient descent, or DP-SGD) or in a distributed training scheme called Private Aggregation of Teacher Ensembles (PATE). It is not necessary to go into the specifics of these methods, just know that they inject noise into a model to make its predictions independent of the specific training data used. Like all differentially private mechanism, these methods rely on abstract privacy parameters. These parameters are difficult to precisely estimate, which makes it difficult to evaluate a model as “private” or not, especially across domains and datasets.
Implementing these techniques for the purposes of this research project is a simple process, and open source code from Google and Facebook AI Research (FAIR) was used to estimate privacy levels. Unfortunately, both methods add significant computational requirements which can make it time consuming to train on complex models and datasets such as CIFAR-10, seen above in the model inversion attack.
In practice, these computational overheads may introduce critical constraints to some projects, however many valuable machine learning projects use models and data which are sufficiently simple to be easily computed in a privacy-preserving way. Where complexity is unavoidable, the costs of privacy can be minimised by first iterating and perfecting a model in a non-private way before introducing privacy-preserving techniques to produce the final model.
Fig. 3 shows the results of a model inversion attack on models which classifies FashionMNIST (a dataset of different types of clothing) images, with varying levels of differential privacy applied. Evidently, the addition of greater noise to the training process destroys the ability of an attacker to expose private information in the model inversion attack.
Fig. 3. Data recreated using a GAN attack on a differentially private FashionMNIST classifier with 0.1σ (top), 0.5σ (middle) and 1σ (bottom) noise
Fig. 4 shows a graph of model performance using DP-SGD across several values of privacy-related hyperparameters. There are two crucial points to draw from this figure: Firstly, the training/test accuracy gap is smaller when even a small amount of differential privacy is added to the model. Differential privacy acts as a regulariser that could aid model generalisation in a similar way to techniques like Dropout and Batch Normalisation. Secondly, the differentially private models perform better than the non-private model (at a certain level of noise, not shown here, the model fails to learn anything).
Fig 4. Performance of differentially private FashionMNIST classifiers during training.
Admittedly the model used was not state-of-the-art, so this result is not guaranteed in a production environment. However, this demonstrates that it is not necessarily the case that any amount of privacy comes at the cost of performance. The relationship between privacy and performance is clearly nuanced and response of a model to privacy preserving techniques will depend heavily on model architecture and data complexity.
First and foremost, this project demonstrated that differential privacy can be applied to protect neural networks from unintended data exposure. We trained a model which classifies the FashionMNIST dataset with an accuracy on par with non-private models and have demonstrated that it is fully protected from model inversion and membership inference attacks. In applying differential privacy to defend against privacy attacks, we demonstrated a method to evaluate the privacy of a model empirically, rather than relying on abstract privacy parameters favoured in differential privacy literature. However, differential privacy is not a one-stop solution to all threats, so other methods such as homomorphic encryption are requires, as well as non-technical solutions to validate the intent of those looking to provide or use private data.
Implementing differential privacy in machine learning model training for the purposes of this research assignment was not a difficult task but was aided nonetheless by open-source libraries. Outside of machine learning, one implementation of differential privacy was compromised by using an incorrectly implemented random noise generator; this demonstrates a need for robust, validated privacy tools to use in our work. Some privacy libraries are in active development, such as FAIR differential privacy, TFEncrypted, TENSeal and OpenMined. None of these are production ready as of April 2020, although this should soon change. Remaining up to date with the landscape of privacy tools is vital if we are to implement safe and secure models.
In machine learning, security concerns are not yet fully explored or widely known, which presents a dangerous time for those looking to increase the role of machine learning in their operations. This project has shown that differential privacy should be in the armoury of not just the machine learning engineer and data scientist, but of anyone handling sensitive data to protect against the most well-known of these security flaws. Privacy-preserving machine learning has the potential to unlock the world’s data, and through programs like Cortex, data becomes a little bit safer.