Research Projects – Research Experiences for Undergraduates (REU)

Project 1: Deep Learning to Enable Multi-messenger Astrophysics Discovery in the LSST Era

Co-mentors

Dr. Eliu Huerta
Argonne National Laboratory

Dr. Roland Haas
University of Illinois – NCSA

Project Description

This project will focus on the design of DL algorithms for the detection and parameter estimation of gravitational wave sources, and on the identification of their electromagnetic and gravitational emission. We will build upon our work in these research areas, and will leverage NCSA’s role as the main data hub for the Dark Energy Survey and the Legacy Survey of Space and Time to create DL tools to explore the processing and identification of astrophysical sources that may be observed concurrently in the gravitational and electromagnetic spectra.

Student contributions

The REU student will learn several skills to contribute to this project, including the use of supercomputers to curate the datasets to train, validate and test neural networks using GPU-based systems at NCSA, Argonne National Lab and Oak Ridge National Lab. The student will become acquainted with open source platforms for DL research, including TensorFlow and PyTorch, as well as the use of distributed learning solutions such as Horovord. The student will learn to process and correctly interpret the predictions of these deep learning models, as well as to use scientific visualizations to understand how these models abstract knowledge from data, and what sectors of the deep learning model are involved in the prediction of results.

Project 2: Machine Learning and Geospatial Approach to Targeting Humanitarian Assistance Among Syrian Refugees in Lebanon

Co-mentors

Dr. Angela Lyons
Agricultural & Consumer Economics

Dr. Aiman Soliman
University of Illinois - NCSA

Project Description

An estimated 84 million persons are forcibly displaced worldwide, and at least 70% of these are living in conditions of extreme poverty. More efficient targeting mechanisms are needed to better identify vulnerable families who are most in need of humanitarian assistance. Traditional targeting models rely on a proxy means testing (PMT) approach, where support programs target refugee families whose estimated consumption falls below a certain threshold. Despite the method’s practicality, it provides limited insights, its predictions are not very accurate, and it can impact the targeting effectiveness and fairness. Alternatively, multidimensional approaches to assessing poverty are now being applied to the refugee context. Yet, they require extensive information that is often unavailable or costly. This project applies machine learning and geospatial methods to novel data collected from Syrian refugees in Lebanon to develop more effective and operationalizable targeting strategies that provide a reliable complementarity to current PMT and multidimensional methods. The insights from this project have important implications for humanitarian organizations seeking to improve current targeting mechanisms, especially given increasing poverty and displacement and limited humanitarian funding.

Student Contributions:

We are looking for a student with experience in basic programming in Python and/or R and basic knowledge and skills in machine learning; experience with GIS and geospatial analysis is a plus. Anticipated tasks include assisting the team with: (1) data preprocessing, (2) data modeling, analysis, and predictions, and (3) the creation of mappings and other data visualizations. The student will develop and review code and create documentation for the code. They will also assist in developing machine learning algorithms and then training, validating, and testing the algorithms. The student will also create a GITHUB for the team, where they will prepare and upload scripts and other documentation for the project. The student will meet with the mentors on a regular basis, participate in team meetings, and actively engage with graduate students.

Project 3: Estimation of Crop Productivity from Multi-sensor Fused Satellite Data

Co-mentors

Dr. Kaiyu Guan
Natural Resources & Environmental Sciences

Dr. Shenlong Wang
Computer Science

Project Description

Reliable and time-lead forecasting systems for crop type and crop yield has critical values for various purposes for farmer communities and government agencies. Field-level estimation of crop yield is particularly useful for understanding how crop productivity responses to various management requirements and environmental factors. This project aims to develop scalable ML methods to integrate data from satellite remote sensing and other auxiliary information to make accurate yet cost-effective predictions of crop type. We are now working on generating a 30-meter (2000-present), daily, cloud-free data stack for the three major Corn Belt States, Illinois, Iowa, Indiana, by integrating three major satellite datasets (Landsat, MODIS, and new Sentinel-2).

Student contributions

The REU student will contribute to the implementation of a computational pipeline that extracts real-time satellite images, deposits data into a database and develops data fusion and ML components for predictive tasks. The student will be utilizing storage and run computation-intensive codes on the Blue Waters supercomputer. The student will have meetings with the mentors on a regular basis, participate in group meetings, and interact with graduate students.

Project 4: Evaluating Machine Learning Algorithms for Gravitational Lens Detection in LSST/DC2 Data

Co-mentors

Dr. Gautham Narayan
Astronomy

Mrs. Aadya Agrawal
Astronomy

Project Description:

In the era of astronomical surveys, machine learning has emerged as a valuable tool for processing extensive datasets efficiently. This project will evaluate several deep neural networks for the identification of gravitational lenses in wide-field survey imaging data. Gravitational lensing provides valuable insights into the universe, probing the curvature of space-time, and therefore providing constraints on dark matter and dark energy in the Universe.The student involved in this project will evaluate various lens-finding algorithms using pre-existing data, both simulated, from the Legacy Survey of Space and Time (LSST) and the Dark Energy Science Collaboration (DESC) as well as real data from the Dark Energy Survey (DES). Using various metrics, they will perform a comparative analysis to assess the performance of the different methods. The student's work will be instrumental in the development of an ensemble learning technique incorporating the most effective components from the tested approaches. The project is anticipated to be completed over approximately 8 weeks, presenting a unique opportunity to make significant contributions to the field of astrophysical data analysis with the upcoming Vera C. Rubin Observatory.

Student Contributions:

Strong proficiency with python, familiarity with neural networks is a plus

Project 5: Enhancing Response Accuracy of an LLM-Based Teaching Assistant Tool

Co-mentors

Dr. Volodymyr Kindratenko
University of Illinois - NCSA

Mr. Kastan Day
University of Illinois - NCSA

Description:

We are looking for a dedicated and enthusiastic student to work on an exciting project aimed at improving the response accuracy of an LLM-based teaching assistant tool. Our team has developed a functional version of the tool, accessible at uiuc.chat, and we are eager to enhance its capabilities further. The primary goal of this project is to develop innovative methods to improve the accuracy of the teaching assistant tool's responses, particularly in the context of factual information. We are inspired by the concepts outlined in the research literature, specifically the Chain-of-Verification methodology. We are interested in creating a novel factual consistency model that will ensure the correctness of fact-based answers provided by the tool.

Student Work:

The selected student will work closely with our team and our collaborators on the following tasks:

Literature Review: Conduct a comprehensive literature review to understand existing methods related to fact verification, Chain-of-Verification, and similar concepts in the field of natural language processing and machine learning.

Model Development: Collaborate with the team to design and develop a novel factual consistency model based on the ideas outlined in our proposal. This model should enhance the tool's ability to provide accurate and reliable answers.

Integration and Testing: Integrate the developed model with the existing LLM-based teaching assistant tool. Conduct rigorous testing to ensure seamless integration and validate the accuracy and effectiveness of the enhanced tool.

Skills:

Strong background in natural language processing, machine learning, or a related field. Proficiency in programming languages commonly used in machine learning research, such as Python. Ability to work effectively in a collaborative team environment. Excellent problem-solving skills and attention to detail. Strong communication skills to present findings and collaborate with team members and external partners.

Project 6: Nutrition Data Collection and Analysis Tool with Generative AI Integration

Co-mentors

Dr. Volodymyr Kindratenko
University of Illinois - NCSA

Dr. Sharon Donovan
Biomedical and Translational Sciences

Description:

This project is an exciting opportunity to collaborate with a team of fellow students in the development and implementation of a cutting-edge tool for collecting nutrition data from images of foods consumed throughout the day and analyzing it using advanced generative AI techniques. By participating in this project, students will gain hands-on experience in the development of innovative tools, explore the intersection of technology and nutrition, and contribute to improving the understanding of nutrition data through advanced analytics and generative AI. Research activities will include developing a mobile application, data analysis based on the collected data, development of a large language model for data analysis, integration of various sources of data with the large language model.

Student Work:

The project is divided into two main areas of focus:

1. Data Collection Mobile Application Development and Validation. The student will be responsible for the design, development, and validation of a user-friendly mobile application dedicated to collecting nutrition data.

2. Development of Data Analytics Pipeline for Nutrition Data. The student will focus on developing a robust data analytics pipeline to process and make sense of the collected nutrition data.

Skills:

Proficiency in programming languages such as Python, Java, Swift. Strong understanding of mobile application development principles and/or data processing and analysis techniques. Familiarity with data analytics libraries and tools (e.g., pandas, NumPy, scikit-learn). Knowledge of generative AI models and frameworks is a plus. Excellent problem-solving skills and ability to work well in a team-oriented environment.

Project 7: Machine Learning Models for Natural Systems

Co-mentors

Dr. Volodymyr Kindratenko
University of Illinois - NCSA

Dr. Alex Tartakovsky
Civil and Environmental Engineering

Description:

This project will contribute to the development of machine learning surrogate models for natural systems such as climate models, subsurface transport models, and turbulence models. Specifically, this project will focus on solving large eigenvalue problems, principal component analysis, and training massive deep neural networks using parallel computing.

Student Work:

Students will have an opportunity to use parallel computing to train large DNN models, solve eigenvalue problems, and perform PCA analysis on massive datasets.

Skills:

Preferred set of skills: applied mathematics, large-scale data analysis, linear solvers, parallel computing.

Project 8: Physics-informed Machine Learning: a pathway for explainable and efficient AI

Co-mentors:

Bruno Abreu
NCSA

Matthew Krafczyk
NCSA

Project Description:

As we move further into the Big Data era, machine learning and artificial intelligence methods are becoming more and more critical tools to advance scientific knowledge that foster groundbreaking technological applications. Processing, modeling, and understanding large amounts of data requires enormous computational power and, fundamentally, new algorithms and theoretical approaches that can flexibly incorporate domain science knowledge. In this project, we are investigating how to improve our ability to more effectively incorporate such domain knowledge when developing and training novel machine learning models.

Student Contributions:

In this project, the student will work with a proof-of-concept to show that this incorporation process can lead to more efficient, more accurate, and/or more explainable models. The successful applicant will use DRYML, a new framework that encapsulates infrastructure and enables machine learning practitioners to focus on their models. Students will also contribute to improving DRYML's usability and flexibility.

View Past Projects