Insurance Privacy Preservation in Federated Learning Collaboration

Due to privacy and data confidentiality concerns, today’s insurance industry is rife with the protectionism of proprietary data, which has become a major roadblock that prevents the free flow of data and collaborations between data scientists and analysts. The inaccessibility of data across the boundaries of insurance firms or even business divisions within a corporation makes it difficult to develop comparative analysis and to uncover business insights that can only be learned from the aggregation of data across the board.

Federated learning has been proposed in recent years as a privacy-preserving solution to collaborative machine learning tasks, and it allows data owners to collectively build a model without sharing sensitive data with each other or to a centralized server. This technique has seen success in a variety of scenarios, such as healthcare, content recommendation, and smart transportation, and therefore, it has the potential to make an impact on addressing the data concerns in the insurance industry. However, the application of Federated Learning in the insurance industry is further complicated by the demand of privacy of feature names, private identity information in structured data. Hence, in this project, we will explore privacy preservation methods targeting the use-cases of insurance industry built on top of the framework of Federated Learning (e.g., feature alignment, entity resolution, differential privacy, etc.) so that we can build a specialized real-world data collaboration platform for insurance industry.

Supervisors: Runhuan Feng, Zhiyu (Frank) Quan

Graduate Supervisor: Panyi Dong