Skip to main content

Case study: homomorphic encryption for data sharing

Contents

Developed in collaboration with Duality Technologies

Context

A group of law enforcement agencies and financial services organisations have formed a consortium to share personal information to detect and prevent financial crimes and related harms (eg fraud, money laundering, and cybercrimes). For these purposes, the members operate as independent data controllers. A “hub”, which acts independently to the other parties, acts as an intermediary. It receives and forwards queries to the other members, and then collects, aggregates, and forwards responses.

When a member of the consortium (controller A) conducts a financial crime investigation, it can submit a homomorphically encrypted query about a person to other members within the consortium (controllers B, C and D). The query will ask other members if they hold information for a particular person that is linked to financial crime activity.

The query is then sent to members via the hub to controllers B, C and D. These controllers send their homomorphically encrypted responses back to the hub. The hub aggregates them, before sharing the response with controller A, who is the only party to see the final response.

  1. User creates and encrypts query, which is sent to the hub
  2. Hub distributes query to other participants, masking the original inquirer and query from view
  3. Encrypted query runs on each participants' database
  4. Encrypted results sent back to the Hub
  5. Hub aggregates the encrypted results
  6. Aggregated and encrypted results sent back to user
  7. Results are decrypted and actioned

Personal information that is encrypted in queries includes:

  • customer identifiers and contact information (eg names, email addresses, postal addresses, ID numbers);
  • financial transaction information (eg dates, amounts, counterparties);
  • information about criminal convictions and offences;
  • device information (eg IP address, device ID); and
  • indicators of fraud or other financial crime.

Objective

Financial institutions who are members of the consortium process customer personal information to detect and prevent financial crime. Each member of the consortium wants to be able to share personal information about actual or suspected instances of fraud or other financial crime committed by its customers (or by known criminals, in the case of the law enforcement agencies). They all benefit from this reciprocal sharing of similar personal information by the other members.

Technical measures

Each member of the consortium uses fully homomorphic encryption (FHE) techniques to ensure an appropriate level of security when sharing personal information between members. A SQL-like query language is used to construct the queries.

The personal information in the query is homomorphically encrypted, rendering it pseudonymised to controller A, as it holds the private key required for decryption. Only controller A can convert the information back into personal information. For example, the identifiers for a customer are encrypted as below (where XXXXX represents an encrypted field in the query):

“Do any accounts owned by [John Smith; NI Number: AB1234C; date of birth: 01/01/1980] have confirmed fraud flags?”

“Do any accounts owned by [xxxxxxx; NI Number: xxxxxxx; date of birth: xxxxxxx] have confirmed fraud flags?”

The homomorphically encrypted query is then sent to members (controllers B, C and D) via the hub. Using homomorphic encryption techniques, controllers B, C and D can perform data matching on the encrypted query with their own customer information. However, they do not ‘see’ the original personal information in the query parameters and are prevented from learning which records in their data may have matched the query. This means controllers B, C and D can automatically respond to the query without needing to decrypt the personal information. The hub is used to route the encrypted query and results. It cannot see the query parameters nor the results that are encrypted (and the hub cannot decrypt them as it does not hold the decryption key).

The hub aggregates the individual homomorphically encrypted query responses, so that controller A does not see the specific responses provided by the other members. Therefore, it does not know whether the person is a customer of controllers B, C or D. The hub is also unable to infer these insights. The data flows are depicted in the diagram above.

For example, if a bank wants to better understand the risk profile of one of its customers, it may want to know whether accounts owned by the customer are receiving transfers from high-risk jurisdictions.

  1. The bank sends an encrypted query to the network asking, “Have accounts owned by [this person] received transfers from high-risk jurisdictions in the last 30 days? If so, how many transactions from how many jurisdictions?”
  2. Each member provides an encrypted response to the request. The underlying response from each member may look like “Yes; 20 transactions from 5 high risk jurisdictions,” or “No”.
  3. The hub receives the encrypted responses (which it cannot decrypt). It then calculates an encrypted risk score based on the inputs from each respondent. The hub cannot see the underlying information nor the risk score it has generated.
  4. The encrypted risk score is sent back to the bank, who can decrypt it. It is then able to understand the risk of a given customer, without knowing where else the customer might be banking, and without obtaining any information about specific transactions.

Organisational measures

The members underpin this processing by a contractual arrangement and information governance controls which include:

  • data protection impact assessments (DPIAs);
  • pre-defined types of queries and information they will share;
  • processes for raising and correcting issues with inaccurate information. This includes a complete audit trail to address people’s information rights requests;
  • multi-factor authentication to ensure only authorised end users can access the system;
  • enforcing permissions for individual users of the system to determine who has the right to deploy queries; and
  • training for end users on the system, including how to submit queries and how to configure rules about which queries the member will participate in.

How do the technical and organisational measures achieve the objective?

The system supports UK GDPR compliance in three main ways:

  • Helps to fulfil the requirements of the security principle by providing appropriate technical and organisational measures.
  • Supports accuracy principles as using this technology produces results that are equivalent to those in the clear. Therefore, there is no negative impact on data utility or the accuracy of the results.
  • The use of homomorphic encryption helps to comply with data protection by design obligations.

The technical and organisational measures significantly reduce the risk to people as:

  • parties can only decrypt queries and results with permission. This ensures information is protected even when computations occur. Additionally, due to the aggregation performed by the hub, no party in the consortium knows which party made the enquiry or provided a response. This is a benefit for people as no unnecessary suspicion is raised by the enquiry, if the queried person is innocent of any financial crime; and
  • the system provides higher levels of security compared to methods that do not employ homomorphic encryption. If they used ‘traditional’ methods of encryption, the controllers receiving the query would need to decrypt the personal information in order to provide a response. This creates additional risk as the information could be exposed to an attacker. By using homomorphic encryption, the information is never decrypted and therefore it reduces the risk from attacks.

Risk and mitigation

  • Risk of HE key compromise:
    • a new key pair is generated for each computation session, and removed immediately afterwards. This means that they cannot use old keys or old ciphertexts to find patterns and reverse engineer newly generated ciphertexts.
  • Risk of hub compromise:
    • The hub never has access to unencrypted personal information. This means that if the hub is compromised, a hacker will not be able to access any personal information.
    • They can also protect connectivity to the hub by other methods such as a VPN or firewall.
  • Handling results:
    • Decrypted results are kept in the consortium members’ dedicated platform. Within each, access to the results is available only to authorised users.
  • Risk of collusion between members:
    • There are both contractual and technical controls against collusion. For instance, from a technical perspective, multiple parties have to cryptographically “agree” to decrypt a response. This helps reach agreement about sharing a result. From a contractual perspective, the parties have agreements with one another to prevent collusion.
  • Risk of system attacks:
    • The rate at which they can make queries, the type of queries and the people with permissions to make them are all monitored and restricted. This mitigates the risk of attackers making repeated queries on the information to extract as much as possible and reconstruct the dataset.