Data Discovery and Consolidation

As data-driven decision-making becomes increasingly important in all fields, it is crucial to have a comprehensive understanding of the datasets at our disposal to facilitate research, analysis, and innovation across disciplines. The primary objective of this project is to conduct an extensive data discovery process within UIUC to identify and catalog various datasets that exist across different units, research centers, and government entities. By consolidating these datasets into a centralized repository, we can provide researchers, students, and faculty members with a unified platform to access a wide range of data for their projects and initiatives. We anticipate that this project will require dedicated personnel, access to relevant systems, perform data visualization, creating a relational database. In previous semesters, we prepared an environment to host the database at NCSA and discovered hundreds of datasets. This semester, we will migrate these datasets into the database, allowing students to gain experience with database operations and data cleaning.

Supervisors: Zhiyu (Frank) Quan, Eli O’Donohue

Graduate Supervisor: Jiayi Guo