Spatiotemporal Modeling on Foot Traffic Data to Unlock Auto Insurance Geo-risks
Project Description
Foot traffic data is captured by various sources, such as smartphone APP, telematics devices in the vehicle, which can help insurance monitor policy holders’ behavior. It is beneficial for insurance companies to price the risk accurately and accelerate the underwriting process. On the other hand, policyholders are given incentives for good driving behavior. There are various state-of-art techniques to extract useful information from the high-dimensional foot traffic data, including spatial and temporal analysis, and geospatial analysis.
In this project, we intend to create spatial and temporal models to identify the policyholders’ driving behavior in certain CBG (Census Block Group) or city/county levels and provide guidance for auto insurance geo-risk. Currently, we are investigating the association between accident and foot traffic based on the 2018-2019 vehicle accident report from Indiana state.
Expectation
- One student researcher is needed for this research project.
- Proficient coding skills in Python and machine learning/data science skill sets are required.
- Strong communication skills and commitment to project success and deliver results on time.
Supervisor: Zhiyu Quan
Graduate Supervisor: Changyue Hu
Representation Learning for Insurance Products
The insurance industry has long known the importance of data, and the success of its business heavily relies on data collection and analysis. With the fast growth in computing power and the development of machine learning techniques, more and more variables/features are used in predictive analysis in various aspects of insurance, such as rate making, loss reserving, and risk management. While most of the numerical or categorical variables can be easily thrown into a machine learning model, the unstructured text data remain largely under-utilized.
One of the popular ways to use text data is feature engineering, which often involves manually creating algorithms to extract information from the text, such as the word count and the sentiment analysis. Although this approach provides a “measurement” for the entire text that can be easily interpreted, discovering new features usually requires domain knowledge and quite time consuming. Recently, many researchers have started using Natural Language Processing (NLP) to facilitate textual analysis. While many of the deep learning models succeed in improving prediction accuracy for supervised learning tasks, they often provide little tractability and interpretability, which are of importance in decision making.
In this project, we will explore the representation learning techniques for understanding unstructured text data, aiming to provide a low-dimensional and interpretable representation of texts. In the previous semester, we have reviewed literatures on both supervised and unsupervised learning tasks, and implemented several novel algorithms with long short-term memory (LSTM) neural networks and bidirectional encoder representations from transformers (BERT). For the current semester, we will continue exploring the literature, and modify the existing algorithms with an emphasize on the interpretability of the model.
Supervisor: Xiaochen Jing
Graduate Supervisor: Yuxuan Li
Insurance Privacy Preservation in Federated Learning Collaboration
Due to privacy and data confidentiality concerns, today’s insurance industry is rife with the protectionism of proprietary data, which has become a major roadblock that prevents the free flow of data and collaborations between data scientists and analysts. The inaccessibility of data across the boundaries of insurance firms or even business divisions within a corporation makes it difficult to develop comparative analysis and to uncover business insights that can only be learned from the aggregation of data across the board.
Federated learning has been proposed in recent years as a privacy-preserving solution to collaborative machine learning tasks, and it allows data owners to collectively build a model without sharing sensitive data with each other or to a centralized server. This technique has seen success in a variety of scenarios, such as healthcare, content recommendation, and smart transportation, and therefore, it has the potential to make an impact on addressing the data concerns in the insurance industry. However, the application of Federated Learning in the insurance industry is further complicated by the demand of privacy of feature names, private identity information in structured data. Hence, in this project, we will explore privacy preservation methods targeting the use-cases of insurance industry built on top of the framework of Federated Learning (e.g., feature alignment, entity resolution, differential privacy, etc.) so that we can build a specialized real-world data collaboration platform for insurance industry.
Supervisors: Runhuan Feng, Zhiyu Quan
Graduate Supervisor: Panyi Dong
FinTech Regulatory Sandbox
Project Description
Regulatory sandbox is a framework in which businesses test out innovative business models, product, services without subjecting to the full regulations for the activities. The UK Financial Conduct Authority, the US Consumer Financial Protection Bureau are among the first regulatory agencies to introduce the regulatory sandbox and to propagate the concept around the world. This research is to understand the regulatory framework and compare sandbox programs around the world.
Expectation
Students in this research project are expected to do extensive reading on this subject and to summarize their readings, compare different models and write a report on the practice of sandboxes.
To test your writing ability, please read the following website and submit a one-pager for summary. https://www.fca.org.uk/firms/innovation/regulatory-sandbox
Supervisor: Runhuan Feng