Building an NLP-Powered Repository and Search Tool for Cyber Risk Literature Project

Since the time when cyber insurance was first introduced to the market, there has been a rapidly expanding volume of literature that focuses on other aspects of cyber risk, such as the legal and financial consequences of cyber incidents, and they are closely related to the development of the cyber insurance industry. With the large and growing body of cyber risk literature, we see three major challenges faced by the actuarial research community,

  • No context-aware tool for finding cyber risk resources
  • No central repository of cyber risk resources
  • Lack of accounting for trends in cyber risk research

To address the abovementioned challenges, we propose to build a repository of cyber risk literature, equipped with an NLP-powered search tool that can be easily used by researchers to find relevant materials. The first stage of this project involves identifying sources of literature, creating a program that gathers documents from those sources, and labeling the gathered documents. A databased will be built based on the collected information. On top of that, a web-based user-interface will be built to make it easier for researchers to query the database and see the results in a clear manner. In the second stage, as the database gets sizeable and becomes suitable for training and testing purposes, the labeling of new articles can be automated by natural language processing and machine learning techniques.

Supervisors: Runhuan Feng, Zhiyu (Frank) Quan

Graduate Supervisor: Linfeng Zhang