IRisk Lab Fall 2020 Projects

Using Text Mining and Natural Language Processing (NLP) to Extract Actuarial-Related Information from Online Customer Reviews for Businesses

For the insurance industry, the potential to understand customers and businesses using new dimensions represented by social and other online data can unleash significant new insights from both customer behavior and risk perspective. These insights can drive insurance automation, underwriting efficiency, and enhanced customer experience.

This project is a real-life actuarial data science project provided by Carpe Data. Carpe Data ( is an Insurtech company that provides insurance companies with next-generation data solutions to gain a more in-depth insight into risks.

Carpe Data will share online reviews (text data, for example, Yelp reviews) with IRisk Lab. Using these text data, we will investigate the following tasks, but not limited to:

  • Sentiment analysis of customer reviews.
  • Extract possible risk characteristics for business.
  • Refine business segmentation

Students: Irene Chen, Wennan Huang, Stacy Shen, Boyuan Wang, Sophia Wang,

                      Haoming Yang, Annie, Zheng

Supervisors: Zhiyu (Frank) Quan, Eli O’Donohue (Carpe Data)

Graduate Supervisor: Yong Xie Maintenance and Development

Data visualization is a great tool for you, a wannabe actuary or data scientist, to communicate with your teammates and clients. And interactive visualization is even better. R Shiny is a framework that makes the process of developing an interactive web application easy.

This project is a continuation of a summer 2020 project, in which we have developed a website ( to showcase our data, models, and results for pandemic contingency planning and medical resources allocation, based on the preprint by Chen et al. (2020).

As the vaccine for the novel coronavirus is on its way, we will move forward to the next phase of this research topic and focus on the optimal allocation for vaccination. Therefore, the second part of this project will be integrating new data, models, and results, with the existing website.

Students: Qingxuan Kong, Ben Lipka, Jingying Luo, Sihan Meng, Erchi Wang, Jasmine Yi, 

                      Yi Yuan, Cameron Groch, Xuan Lin

Supervisors: Xiaowei Chen, Alfred Chong, Runhuan Feng
Graduate Supervisor: Linfeng Zhang

Implied Value-at-Risk: Model-Free and Forward-Looking Risk Estimates for the US Banks

A rapid and persistent market value decline of the US bank equity, around 50% since January 2020, can be observed during the COVID-19 pandemic. While the economy as a whole is under stress, stock returns of banks have done worse than those of non-bank financial institutions such as broker-dealers and insurance companies, and non-financial firms. Bank equity returns have been shown to contain information about future macroeconomic consequences like credit contractions and output gaps, and can serve as an indicator of bank distress. However, given the dynamic behavior of financial markets, the information contained in stock returns is deteriorating rapidly with the horizon and equity-based risk estimates are not always a good representation for the true risk levels. Especially in times like now, when a market is in distress, the discrepancy between the expectation and the realized outcome is large resulting in a bad risk estimate when it is needed the most.

Therefore, we investigate how option data can be employed to determine the risk level of the US banks. Such an estimate is called an implied estimate. Option prices contain the aggregate view of the market on the future stock price level, i.e., not just on its expected value, but rather on the distribution around that expectation. Therefore, an option-implied estimate is automatically forward-looking. The selected risk estimate is the Value-at-Risk (VaR) measure which can be seen as the maximal future loss in a given time frame. We examine model-free approaches for determining the VaR such as the binomial tree option pricing.

Students: Chen Chen, Haoxuan Fu, Churui Li, Ruiqi Liu

Supervisor: Daniël Linders

Graduate Supervisor: Elizaveta Sizova

AI-Powered Lifecycle Financial Planning

Life is a game with sophisticated rules. With today’s AI technology, we can develop a lifecycle strategy guide for successes in such a game. This project aims to build algorithms that optimize decision making process for meeting important financial goals in life. We look for students who can assist in programming various life-changing scenarios for the system.

Students: Jingbin Cao, Zhehui Chen, Shuyue Deng, Jinglun Gao, Changyue Hu, 

               Rustom Ichhaporia, Jiaxin Jiang, Wenjie Liu, Hanqing Wang, Jiayi Wang, Stephen Xu,

                      Wenxiao Yang, Xi Zeng, Haochen Zhong

Supervisors: Alfred Chong, Runhuan Feng, Zhiyu (Frank) Quan

Graduate Supervisors: Xie Yong, Linfeng Zhang

Cyber Risk Data Analysis

With the ever-increasing reliance on technology, such as Zoom meeting, online purchase, and even national security for sensitive research, the importance of proper cyber risk management is evident, especially during the time of COVID-19 when everyone works remotely. Learning from the cyber incident history is crucial in order to build risk management and pricing models. Therefore, all the risk management concerning cyber risk should be based on data-driven evidence.

This project is a continuation of the IRisk Lab projects in Fall 2019 and Spring 2020, where a set of given cyber data is used to construct multivariate frequency and treebased models. This project aims to take a step back to fundamentally understand the cyber data and investigate the potential pitfalls when employing traditional property and casualty actuarial models to fit the cyber data.

Students: Ramsha Ahmed, Shaowen Chang, Ishaan Khanna, Evelyn Lai Jia Yi, 

                     Emmelyn Luveta, Carina Su, Yao Xiao

Supervisors: Alfred Chong, Daniel Linders, Zhiyu (Frank) Quan

Graduate Supervisor: Linfeng Zhang

COUNTRY Commercial/Residual Building Age Model

An important variable in assessing risk of a commercial/residual building is the actual age of the building. While this seems obvious, obtaining the true age of the building can be challenging. COUNTRY has made use of data available but would like to explore machine learning techniques for assessing the age of a commercial/residual building.

Using available data sources, research modeling techniques and deploy best model that predicts a commercial/residual building’s age. This project builds on existing research done in this space, and COUNTRY is particularly interested in researching how image classification techniques complement modeling approaches on structured data to obtain a robust commercial/residual building age model.

Students: Boting Li, Chengzhuang Zheng, Ziqin Xiong

Supervisors: Lois She-Tom (COUNTRY Financial), Matt Morris (COUNTRY Financial)