Case study: differentially private mixed noise in financial services
Developed in collaboration with Privacy Analytics
Context: insights from credit information
BankCo, a financial services company, collects and processes personal information from a variety of sources (eg debt collection agencies, credit card companies, and public records). The company uses this information to help them evaluate risk factors and repayment options when providing loans to people.
Credit scores are based on a variety of personal financial information, including debts and repayment history. This information is particularly sensitive, so BankCo wants to anonymise the information it uses for modelling purposes. However, if excessive deviation is introduced by anonymising it, this could have significant repercussions when they use the information for modelling credit risk and risk management.
In order to reuse or share the collected information or insights for secondary purposes, they need to use appropriate technical and organisational measures. This will ensure that the information is effectively anonymised while preserving its accuracy and consistency. BankCo could share this information with other financial institutions to provide valuable information and intelligence on geographic trends and overall risk profiles that inform business development and outreach. The information can also augment BankCo’s predictive models to create a more complete picture of different markets.
Objective: increase collaboration with statistically useful information
BankCo implements randomisation using a differentially private noise addition method to anonymise the information for secondary purposes. BankCo wants to keep record-level information rather than aggregating it, as this preserves its statistical patterns.
BankCo’s anonymisation process allows its internal researchers and other financial institutions to use the anonymised information and derive new insights.
Technical measures: mixed noise and risk metrics
In this scenario, differential privacy protects the information elements by introducing a level of uncertainty through randomisation (noise injection). This approach limits the amount of personal information that can be extracted by integrating the privacy budget (an information limit on how much can be inferred or learned about someone) into the differentially private dataset itself. This way, BankCo or its collaborators can carry out analytical processing without BankCo having to put any restrictions on the nature of the information queries. This is because BankCo has limited what can be inferred or learned before making the information available.
BankCo uses the risk of singling out, or uniqueness, as a risk threshold to determine the privacy budget for the dataset. This alleviates concerns that exist with differential privacy of variable protection across datasets for the same privacy budget. This approach also ensures that people are not identifiable by providing plausible deniability for anyone’s contribution. The anonymisation process uses several differentially private noise addition schemes (eg they use a Laplace distribution because it meets the mathematical properties that are used to define differential privacy).
Table 1 shows a sample of the personal financial information of five people from a larger dataset to demonstrate the techniques used. BankCo’s challenge was to set the privacy budget to ensure they add sufficient noise to anonymise the information while maintaining sufficient utility for the analysis.
Table 1: Example of personal financial information
Person | Age | Income (1,000’s) |
Assets (1,000’s) |
Debts (1,000’s) |
Credit Utilisation |
1 | 24.3 | 65.0 | 245 | 45.0 | 0.460 |
2 | 25.5 | 63.0 | 270 | 48.0 | 0.450 |
3 | 27.6 | 75.0 | 324 | 60.0 | 0.490 |
4 | 29.8 | 85.0 | 375 | 74.0 | 0.520 |
5 | 30.1 | 90.0 | 395 | 71.0 | 0.520 |
To preserve statistical properties of computed outputs while introducing a measurable level of uncertainty, they used a mixed noise mechanism. This combines a normal distribution (bell curve) with a Laplace distribution. Laplace noise is used to manage particularly sensitive information points that would otherwise be at risk of singling someone out. This approach allows models to be calibrated to account for the (predominantly normal) noise while using a consistent set of analytical methods and tools.
Table 2 shows how BankCo randomised the information by adding noise. The ranges shown indicate the precision around each value once noise is added to the entries in the table. Overlap between the individual records can be seen due to the inclusion of confidence intervals. For example, person 1 and person 2 share similar profiles, and separately person 3 and person 4 share similar profiles. Person 3 (outlier), with the addition of normally distributed noise, still contains outlier values with no overlap with other people’s information that could allow singling out. To prevent person 3 being at risk of singling out, BankCo introduces additional Laplace noise to this record, as shown in Table 2.
Table 2: Example with confidence intervals for randomisation
Person | Age | Income (1,000’s) |
Assets (1,000’s) |
Debts (1,000’s) |
Credit Utilisation |
Noise |
1 | (23.2-25.3) | (60-70) | (220-270) | (40-50) | (0.45-0.47) | Normal |
2 |
(24.5-26.5) |
(58-68) | (245-295) | (43-53) | (0.44-0.46) | Normal |
3 (outlier) | (26.6-28.6) | (70-80) | (299-349) | (55-65) | (55-65) | Normal |
3 (tuned) | (25.6-29.6) | (65-85) | (274-374) | (50-70) | (50-70) | Laplace |
4 | (28.8:30.8) | (80-90) | (350-400) | (69-79) | (0.51-0.53) | Normal |
5 | (29.1:31.1) | (85-95) | (390-620) | (66-76) | (0.51-0.53) | Normal |
For person 3 (tuned), they used Laplace noise to ensure:
- the person’s record cannot be singled out (as there is no overlap between each person’s information); and
- the resulting dataset is differentially private.
In practice, BankCo found that few records require this treatment when using this mixed noise mechanism.
Table 3 shows the anonymised differentially private dataset. In practice, a larger dataset would demonstrate greater variation than shown in the table, as such a small dataset would require significant randomisation to be differently private.
Table 3: Example with differentially private data
Person |
Age | Income (1,000’s) |
Assets (1,000’s) |
Debts (1,000’s) |
Credit Utilisation |
1 | 24.1 | 66.3 | 245 | 44.9 | 0.453 |
2 |
26.4 |
64.7 | 270 | 53.5 | 0.448 |
3 | 27.7 | 72.3 | 344 | 54.3 | 0.499 |
4 | 29.9 | 85.5 | 376 | 77.5 | 0.519 |
5 | 30.1 | 91.2 | 414 | 71.2 | 0.525 |
How do the technical and organisational measures achieve the objective?
BankCo uses a secure data environment to access the information, and gives access to this environment to its collaborators when it shares the anonymised information. This approach reduces the risk of re-identification by limiting the sharing (only approved collaborators have access). As the information is shared only with known financial institutions, the risk of attackers is lower, so BankCo decides to apply less noise. This increases the utility of the information they share. BankCo uses best practice security measures (ISO 27001) for the data sharing environment. They take the following actions:
- access logging and control;
- monitoring and alerting;
- ensures that the financial institutions accessing the information understand and follow the terms of use;
- penetration testing and auditing;
- output screening; and
- ensuring that all institutions that access the data sharing environment meet staff training requirements and are regularly reminded of the terms of use of the information.
The infrequent inclusion of Laplace noise to deal with outliers in the mixed noise mechanism reduces the degree of randomisation they need. This improves the statistical usefulness of the information and ensures correct statistical inference and statistics that can be calculated with true confidence intervals.
The technical and organisational measures they use reduce the risk of re-identification to a sufficiently remote level by:
- mitigating risks of singling out;
- linkability;
- inference with other available information;
- strictly controlling information access and use; and
- performing periodic identifiability assessments to determine if the technical or organisational measures provide effective anonymisation.