NCSA REU Program currently has the below Faculty Sponsored Projects to choose from for this summer.
Summer REU Projects
(Click to expand each project)
PROJECT 1: Spatial Analysis of Tumor Heterogeneity using Machine Learning Techniques
Co-Mentors:
Dr. Zeynep Madak-Erdogan, Food Science & Human Nutrition
Dr. Aiman Soliman, NCSA
Project Description:
Tumor heterogeneity is an inherent feature of all tumors that drives resistance to therapies. With the advent of new approaches integrating single cell sequencing with spatial tumor data, interdisciplinary teams can better understand local changes in tumor metabolism, biology, as well as different cell populations that affect different aspects of immune responses, drug resistance, and metastasis. This project will leverage spatial data analysis tools from geospatial science to quantify tumor microenvironment spatial heterogeneity. Machine learning techniques will be utilized to prepare image data for the ensuing integration of location and sequencing data. A Python pipeline comprising different tools developed for each step of the data analysis will be constructed to share with the public. The project will also develop a generative AI tool to assist clinicians in interacting with the pipeline and generating prompts to maximize benefits of this analysis.
Student Contributions:
The student will work on implementing spatial indices to quantify the tumors’ 2D heterogeneity and training and evaluating classical machine learning and deep learning models to connect the spatial indices and higher dimension genetic markers data. The student will implement this in Python using one of the AI development frameworks, such as PyTorch or TensorFlow.
PROJECT 2: AI-Powered Intelligent Analysis of Multimodal MRI Images of the Brain
Co-Mentors:
Dr. Zhi-Pei Liang, Electrical and Computer Engineering
Dr. Yudu Li, Bioengineering
Project Description:
Brain mapping is one of the most exciting frontiers of contemporary science, offering unprecedented opportunities to enhance our understanding of brain function and develop treatments for brain diseases. With the rapid development of AI technologies, a new era is dawning for brain mapping technology development and applications that promise to transform our understanding of the brain and further revolutionize healthcare. This project aims to develop physics- and biology-driven generative AI models to capture the complex anatomical, functional, and molecular distributions from open-access multimodal brain MRI datasets. These models will be applied to automated brain lesion detection and segmentation (e.g., tumors and stroke), with integrated uncertainty quantification capability. This project offers a unique opportunity for students to contribute to cutting-edge research at the intersection of AI, medical imaging, and neuroscience.
Student Contributions:
The REU student will gain hands-on experience in multimodal MRI image pre-processing, implementation of state-of-the-art generative AI models using open-source frameworks (e.g., PyTorch and TensorFlow), as well as model training, validation, and testing on GPU-based supercomputers at NCSA. These skills will enable the student to make meaningful contributions to the development and evaluation of advanced AI-driven brain mapping tools for the project.
PROJECT 3: Nutrition Data Collection and Analysis Tool with Generative AI Integration
Co-Mentors:
Dr. Sharon Donovan, Department of Food Science and Human Nutrition
Dr. Volodymyr Kindratenko, NCSA
Project Description:
This research seeks to provide both qualitative and quantitative nutrition counseling through the development of an AI-powered mobile application. Users collect their meal history through photos which are assessed by a computer vision system, and ask for nutrition guidance through an LLM based chatbot. The approach leverages computer vision models to categorize and estimate the nutritional profile of users’ meals. Relevant meal history, user health goals, other relevant user information, and academic literature on nutrition are all ingested by the chatbot to provide personalized recommendations. The goal of leveraging these novel generative AI models is to provide healthy diet recommendations relevant to users’ goals. This form of advice makes diet changes easier to adopt. Throughout development, the dietetics team will continuously test and analyze the application to ensure it is providing accurate and actionable personalized advice.
Student Contributions:
The students will contribute to two main activities: 1. Data collection mobile application development and validation – the student will be contributing to the design, development, and validation of a user-friendly mobile application dedicated to collecting nutritional data. 2. Development of data analytics pipeline for nutrition data – the student will focus on developing a robust data analytics pipeline to process and make sense of the collected nutrition data.
PROJECT 4: Development of ML Models for Time-Dependent Projectile Atomic Geometry
Co-Mentors:
Dr. Andre Schleife, Materials Science & Engineering
Dr. Matthew Krafczyk, NCSA
Project Description:
Time-dependent density functional theory simulations are an accurate, quantum-mechanical approach to model the energy deposition of projectile ions moving through a target material, a process known as electronic stopping. However, due to their explicit description of electrons within the time-dependent Kohn-Sham framework, these simulations are computationally costly and cannot straightforwardly be extended to large length or time scales. This project aims to train machine-learning models using descriptors of the time-dependent projectile atomic geometry in the host material. These models can run much faster and generalize directly to larger length and time scales, making them amenable for interfacing with radiation damage simulations. The project aims to implement this machine-learning approach in an automated way and make the results available to users via an online database, which also will be implemented in this project.
Student Contributions:
Students will read existing data from time-dependent density functional theory simulations and extract atomic descriptors, such as the AGNI fingerprints, as well as electronic stopping values. Students will then train neural network surrogate models on that data and interface them with our existing Newton solver, as well as package them for storage in a database and delivery on a web service. If students are interested in web development, they can also lead that effort.
PROJECT 5: Knowledge-Intensive Multimodal LLMs
Co-Mentors:
Dr. Yuxiong Wang, Computer Science
Dr. Liangyan Gui, Computer Science
Project Description:
The advancements in foundation models, as exemplified by large language models (LLMs), have enabled more natural and engaging interactions between humans and machines. However, current models often struggle when user queries involve certain personal contexts (“my dogs and cats”) or external expert knowledge (“medical diagnosis”). To address this challenge, existing LLMs leverage AI Agents and retrieval-augmented generation (RAG) to access contexts in external databases or the internet, e.g., Perplexity.ai and GPT4. However, none of these methods have paid sufficient attention to the visual information, which is a critical aspect of conversations and understanding the intentions of the users. Therefore, our project will investigate a novel framework of Agentic Vision-RAG to unlock the potential of knowledge-intensive multimodal conversation for LLMs. We are actively looking for ambitious undergraduate students with knowledge and skills in deep learning platforms (Pytorch) and computer vision.
Student Contributions:
The student will assist in designing and implementing deep learning models by leveraging PyTorch and other state-of-the-art tools for computer vision and natural language processing. They will preprocess and curate multimodal datasets, perform model training and fine-tuning, and evaluate the models using metrics for multimodal and conversational tasks. Additionally, the students will participate in regular meetings with mentors and team members, collaborate with graduate students, and document progress through reports, GitHub repositories, and presentations.
PROJECT 6: A Machine Learning and Geospatial Approach to Addressing the Socioeconomic Impacts of Climate Change Among Forcibly Displaced Populations in Brazil
Co-Mentors:
Dr. Angela Lyons, Agricultural & Consumer Economics
Dr. Aiman Soliman, NCSA
Project Description:
Climate change is significantly exacerbating forced displacement in developing countries, intensifying the vulnerability of populations already facing socioeconomic challenges. Rising temperatures, deforestation, and increased frequency of extreme weather events directly threaten livelihoods, particularly in Brazil’s Amazon and coastal regions. For example, rising sea levels and more intense flooding events threaten Brazil’s coastal communities, while prolonged droughts in the northeastern semi-arid regions, known as the Sertão, severely impact agricultural livelihoods, driving rural populations toward urban centers. This migration places immense strain on urban infrastructure, leads to increased competition for resources, and heightens tensions within host communities. The socioeconomic impacts are profound: displaced populations often lose access to education, healthcare, and stable employment, perpetuating cycles of poverty. Furthermore, the loss of cultural ties and traditional ways of life, especially among Indigenous communities, compounds the social disruption. While the international community is gradually increasing its focus on adaptation and resilience, current efforts fall short of addressing root causes and providing adequate support for the most affected regions. Our project bridges the gap between data science and social science to address this critical global issue. We will employ advanced machine learning and geospatial techniques to deepen our understanding of how geographical features and climate impacts intersect with socioeconomic outcomes among forcibly displaced populations. Our aim is to develop sustainable solutions and actionable recommendations for the international community and key multilateral organizations, including the United Nations, World Bank, Inter-American Development Bank, and other stakeholders focused on Latin America.
Student Contributions:
We are looking for students with experience in basic programming in Python and/or R and basic knowledge and skills in machine learning; experience with GIS and geospatial analysis is a plus. Anticipated tasks include assisting the team with: (1) data preprocessing of geospatial and socioeconomic feature data, (2) data modeling, analysis, and predictions, and (3) the creation of mappings and other data visualizations using geospatial and socioeconomic feature data. Students will develop and review code and create documentation for the code. They will also assist in developing machine learning algorithms and then training, validating, and testing the algorithms. Students will meet with the mentors on a regular basis, participate in team meetings, and actively engage with graduate students.
PROJECT 7: Physics-Inspired AI for Gravitational Wave Astrophysics
Co-Mentors:
Dr. Eliu Huerta, Physics
Dr. Hao Peng, Computer Science
Project Description:
This project will lead the creation of cutting-edge, physics-inspired AI models for gravitational wave astrophysics, targeting gravitational wave sources that span a high dimensional signal manifold, i.e., higher order wave modes of eccentric, spin precessing binary black hole mergers. The project will also explore the use of AI agents to prepare datasets, and explore a variety of AI architectures that are optimal for signal detection and characterization. These models will be published in the Garden Platform, and linked with Delta and DeltaAI to enable accelerated AI inference.
Student Contributions:
The REU student will learn several skills to contribute to this project, including the use of supercomputers to curate the datasets to train, validate and test AI models using GPU-based systems at NCSA, Argonne National Laboratory, and Oak Ridge National Laboratory. The student will become acquainted with open-source platforms for AI research, as well as the use of distributed training and inference frameworks. The student will learn to publish the AI models on model hubs, and to use scientific data infrastructure to enable remote inference of these models by connecting the Garden Platform to HPC platforms with Globus Compute.
PROJECT 8: Detection, Instance Segmentation & Classification for Astronomical Surveys with Deep Learning
Co-Mentors:
Dr. Xin Liu, Astronomy
Dr. Shirui Luo, NCSA
Project Description:
The next generation of wide-field deep astronomical surveys will deliver unprecedented amounts of images through the 2020s and beyond. As both the sensitivity and depth of observations increase, overlapping sources will be detected. This can lead to measurement biases that contaminate key astronomical inferences. This project will leverage the rapidly developing field of computer vision to build a new deep learning platform for astronomical object detection, instance segmentation, classification, and beyond. It will adapt the latest open-source algorithms in computer vision for object detection and segmentation. The approach is interdisciplinary, combining state-of-the-art astronomical survey data with the latest deep learning tools. The new platform will be trained and validated using a hybrid of real data and realistic simulations that are built by combining traditional image simulations with generative models. It will be fully featured to enable higher-level down-stream science applications such as photometric redshift estimation and galaxy morphology inferences. All codes generated will be open source to enable broad community usage.
Student Contributions:
The REU student will contribute to the gathering, cleaning, and readying data used to train the models. After data collection and processing, the student will test the deep learning models and implement new data augmentations and training methods. The student will also validate the models with different detection, classification, and deblending metrics to test model performance.
PROJECT 9: Estimation of Crop Productivity from Multi-Sensor Fused Satellite Data
Co-Mentors:
Dr. Kaiyu Guan, Natural Resources and Environmental Sciences
Dr. Dr. Shenlong Wang, Computer Science
Project Description:
Reliable and time-lead forecasting systems for crop type and crop yield have critical values for various purposes for farmer communities and government agencies. Field-level estimation of crop yield is particularly useful for understanding how crop productivity responds to various management requirements and environmental factors. This project aims to develop or apply scalable deep learning methods (e.g., knowledge-guided machine learning, foundational models, and transfer learning) to integrate data from satellite remote sensing and other auxiliary information to make accurate yet cost-effective predictions of crop types and agricultural management practices. The team is now working on generating a 30-meter (2000-present), daily, cloud-free data stack for the US Corn Belt States and beyond by integrating three major satellite datasets (Landsat, MODIS, and Sentinel-2).
Student Contributions:
The REU student will contribute to the implementation of a computational pipeline that extracts real-time satellite images, deposits data into a database, and develops data fusion and ML components for predictive tasks. The student will utilize storage and run computation-intensive codes on the Delta supercomputer.
PROJECT 10: LLM-Based Multi-Agent System to Advance Real-World Applications
Co-Mentors:
Dr. Haohan Wang, School of Information Science
Dr. Jingrui He, School of Information Science
Project Description:
Large language models (LLMs) offer a transformative opportunity to automate complex tasks in various fields. This project focuses on building a multi-agent system that uses LLMs as intelligent collaborators to address real-world challenges. The system will integrate the reasoning and communication abilities of LLMs with domain-specific tools to complete tasks efficiently. The project will explore applications in three key areas. The first is biomedical research, where the system will automate the repeated execution of statistical algorithms over gene expression data to uncover new biological insights. The second is financial document processing, where LLMs will complete forms and evaluate loan quality based on structured criteria. The third is scientific ideation, where LLMs will propose ideas, write code, and test simple research processes autonomously. These examples highlight the potential of LLM-based systems to streamline workflows and reduce human effort. The development process will involve designing a framework that enables LLM agents to communicate and collaborate effectively. This includes combining natural language processing with statistical models and computational tools to achieve specific goals. The project will also create benchmarks to evaluate how well the system automates tasks, integrates multiple components, and meets predefined objectives.
Student Contributions:
Students will help develop and refine the multi-agent framework using state-of-the-art tools such as OpenAI GPT or Google Gemini. They will work on implementing task-specific pipelines in Python, with a focus on biomedical, financial, or research applications. Students will apply techniques like prompt engineering to improve LLM performance. They will also design evaluation metrics to measure task success and ensure reproducibility by creating detailed documentation and open-source code repositories. Through this project, students will gain hands-on experience in designing AI systems for interdisciplinary applications. They will learn to integrate advanced LLM technologies with real-world tools and contribute to research that demonstrates the transformative potential of AI in automating practical challenges.