Algorithm Bias and Interpretable AI
Artificial Intelligence (AI) or Machine Learning (ML) algorithms, in the past few years, have been widely implemented in various industrial applications. Sometimes, these ML algorithms exhibit significant bias, or referred as Algorithm Bias, to certain groups. By the definition, Algorithm Bias refers to the inequality brought by the application of algorithms regarding personal features like socioeconomic status, race, gender, etc. For example, in health care systems, sometimes, for White and Black patients at equal risk level, the predictions by AI algorithms can be racially discriminative where Black people may be cared with less resources.
One of the most challenging problems combating Algorithm Bias is that, some of non-parametric ML models like Support Vector Machines (SVM) and models with excessive number of parameters (hundreds of thousands or even hundreds of billions) like deep Neural Network (DNN) are usually treated as black-box and difficult for interpretation. Thus, it’s difficult for fair decision-making when humans don’t understand the process at all. One of the ways researchers solving Algorithm Bias is through Explainable AI (XAI) and Interpretable AI where the process of decision-making must become transparent and easily understandable by humans.
Insurance industry is embracing the wave of AI and given the fact that insurance companies have extensive impact on personal lives, it’s essential for insurance companies to justify their decision-making and impose fairness on ML algorithms. As an educational project, we will focus on taking literature review on the problem of Algorithm Bias and understand its importance in the business environment. And we will also survey on the current developments in solutions to Algorithm Bias, especially the developments in interpretable AI and think how to incorporate those methods in the context of insurance applications.
Supervisors: Zhiyu (Frank) Quan
Graduate Supervisor: Panyi Dong
Federated Learning for Facilitating Privacy Preserving Collaboration
Due to privacy and data confidentiality concerns, today’s insurance industry is rife with the protectionism of proprietary data, which has become a major roadblock that prevents the free flow of data and collaborations between data scientists and analysts. The inaccessibility of data across the boundaries of insurance firms or even business divisions within a corporation makes it difficult to develop comparative analysis and to uncover business insights that can only be learned from the aggregation of data across the board.
Federated learning has been proposed in recent years as a privacy-preserving solution to collaborative machine learning tasks, and it allows data owners to collectively build a model without sharing sensitive data with each other. This technique has seen success in a variety of scenarios, such as healthcare, content recommendation, and smart transportation, and therefore, it has the potential to make an impact on addressing the data concerns in the insurance industry. In this project, we will learn and implement some popular federated learning algorithms and explore their potential insurance applications.
Supervisors: Runhuan Feng, Zhiyu (Frank) Quan
Graduate Supervisor: Linfeng Zhang, Panyi Dong
Representation Learning for Insurance Products
The insurance industry has long known the importance of data, and the success of its business heavily relies on data collection and analysis. With the fast growth in computing power and the development of machine learning techniques, more and more variables/features are used in predictive analysis in various aspects of insurance, such as rate making, loss reserving, and risk management. While most of the numerical or categorical variables can be easily thrown into a machine learning model, the unstructured text data remain largely under-utilized.
One of the popular ways to use text data is feature engineering, which often involves manually creating algorithms to extract information from the text, such as the word count and the sentiment analysis. Although this approach provides a “measurement” for the entire text that can be easily interpreted, discovering new features usually requires domain knowledge and is quite time consuming. Recently, many researchers have started using Natural Language Processing (NLP) to facilitate textual analysis. While many of the deep learning models succeed in improving prediction accuracy for the response variables, they often provide little tractability and interpretability, which are of importance in decision making as well.
In this project, we will explore the representation learning techniques for understanding unstructured text data, aiming to provide a low-dimension representation of the entire text, with interpretable generated features. We will review related literatures on this topic, understand and implement the representation learning model, and experiment on insurance text data.
Supervisors: Xiaochen Jing
Graduate Supervisor: Yuxuan Li
Spatiotemporal Modeling on Foot Traffic Data to Unlock Auto Insurance Geo-risks
Foot traffic data is captured by various sources, such as smartphone APP, telematics devices in the vehicle, which can help insurance monitor policy holders’ behavior. It is beneficial for insurance companies to price the risk accurately and accelerate the underwriting process. On the other hand, policyholders are given incentives for good driving behavior. There are various state-of-art techniques to extract useful information from the high-dimensional foot traffic data, including spatial and temporal analysis, and geospatial analysis.
In this project, we intend to create spatial and temporal models to identify the policyholders’ driving behavior in certain CBG (Census Block Group) or city/county levels and provide guidance for auto insurance geo-risk. Currently, we are investigating the association between accident and foot traffic based on the 2018-2019 vehicle accident report from Indiana state.
Supervisors: Zhiyu (Frank) Quan
Graduate Supervisor: Changyue Hu