Project 1: Deep Learning to Enable Multi-messenger Astrophysics Discovery in the LSST Era
Co-mentors
Dr. Eliu Huerta
Argonne National Laboratory
Dr. Zhizhen Jane Zhao
Electrical and Computer Engineering
Project Description
This project will focus on the design of DL algorithms for the detection and parameter estimation of gravitational wave sources, and on the identification of their electromagnetic and gravitational emission. We will build upon our work in these research areas, and will leverage NCSA’s role as the main data hub for the Dark Energy Survey and the Legacy Survey of Space and Time to create DL tools to explore the processing and identification of astrophysical sources that may be observed concurrently in the gravitational and electromagnetic spectra.
Student contributions
The REU student will learn several skills to contribute to this project, including the use of supercomputers to curate the datasets to train, validate and test neural networks using GPU-based systems at NCSA, Argonne National Lab and Oak Ridge National Lab. The student will become acquainted with open source platforms for DL research, including TensorFlow and PyTorch, as well as the use of distributed learning solutions such as Horovord. The student will learn to process and correctly interpret the predictions of these deep learning models, as well as to use scientific visualizations to understand how these models abstract knowledge from data, and what sectors of the deep learning model are involved in the prediction of results.
Project 2: Machine Learning and Geospatial Approach to Targeting Humanitarian Assistance Among Syrian Refugees in Lebanon
Co-mentors
Dr. Angela Lyons
Agricultural & Consumer Economics
Dr. Aiman Soliman
NCSA
Project Description
An estimated 84 million persons are forcibly displaced worldwide, and at least 70% of these are living in conditions of extreme poverty. More efficient targeting mechanisms are needed to better identify vulnerable families who are most in need of humanitarian assistance. Traditional targeting models rely on a proxy means testing (PMT) approach, where support programs target refugee families whose estimated consumption falls below a certain threshold. Despite the method’s practicality, it provides limited insights, its predictions are not very accurate, and it can impact the targeting effectiveness and fairness. Alternatively, multidimensional approaches to assessing poverty are now being applied to the refugee context. Yet, they require extensive information that is often unavailable or costly. This project applies machine learning and geospatial methods to novel data collected from Syrian refugees in Lebanon to develop more effective and operationalizable targeting strategies that provide a reliable complementarity to current PMT and multidimensional methods. The insights from this project will have important implications for humanitarian organizations seeking to improve current targeting mechanisms, especially given increasing poverty and displacement and limited humanitarian funding.
Student Contributions:
We are looking for a student with experience in basic programming in Python and/or R and basic knowledge and skills in machine learning; experience with GIS and geospatial analysis is a plus. Anticipated tasks include assisting the team with: (1) data preprocessing, (2) data modeling, analysis, and predictions, and (3) the creation of mappings and other data visualizations. The student will develop and review code and create documentation for the code. They will also assist in developing machine learning algorithms and then training, validating, and testing the models. The student will also create a GitHub repository for the team, where they will prepare and upload scripts and other documentation for the project.
Project 3: Estimation of Crop Productivity from Multi-sensor Fused Satellite Data
Co-mentors
Dr. Kaiyu Guan
Natural Resources & Environmental Sciences
Dr. Jian Peng
Computer Science
Project Description
Reliable and time-lead forecasting systems for crop type and crop yield has critical values for various purposes for farmer communities and government agencies. Field-level estimation of crop yield is particularly useful for understanding how crop productivity responses to various management requirements and environmental factors. This project aims to develop scalable ML methods to integrate data from satellite remote sensing and other auxiliary information to make accurate yet cost-effective predictions of crop type. We are now working on generating a 30-meter (2000-present), daily, cloud-free data stack for the three major Corn Belt States, Illinois, Iowa, Indiana, by integrating three major satellite datasets (Landsat, MODIS, and new Sentinel-2).
Student contributions
The REU student will contribute to the implementation of a computational pipeline that extracts real-time satellite images, deposits data into a database and develops data fusion and ML components for predictive tasks. The student will be utilizing storage and run computation-intensive codes on the Blue Waters supercomputer. The student will have meetings with the mentors on a regular basis, participate in group meetings, and interact with graduate students.
Project 4: Gr-ResQ: Data-driven Approaches for Accelerating Synthesis of 2D Materials
Co-mentors
Dr. Elif Ertekin
Mechanical Science & Engineering
Dr. Sameh Tawfick
Mechanical Science & Engineering
Project Description
The major goal of this project is to advance the state-of-the-art in manufacturing and synthesis by combining machine learning and experimentally validated data. While there exists a tremendous amount of academic and industrial research in synthesis of nano materials such as graphene, or in 3D printing, advancement in these fields currently utilize expensive or tedious trial-and-error experimentation. Utilizing a combination of crowd-sourced and locally derived process parameter data, we will be searching for better procedures for more controllable and repeatable manufacturing.
Student contributions
The REU students will apply ML on experimental data from Gr-ResQ, a database of chemical vapor deposition synthesis recipes, and from 3D printing projects to develop ML models to accelerate the discovery of large-scale graphene production. Students will have to identify and extract relevant features from the dataset of images, Raman spectra, and associated recipes, and then develop a model using traditional ML or DL techniques to predict potentially successful graphene recipes. Students will work closely with experimental collaborators to test their predictions and provide data to feed back into their model.
Project 5: An Integrated Sensing, Machine Learning, and High-performance Computing Framework for Real-time Decision-making in Smart Manufacturing
Co-mentors
Dr. Chenhui Shao
The Grainger College of Engineering
Dr. Seid Koric
Mechanical Science & Engineering
Project Description
The recent development of sensing, communication, and computing technologies and infrastructure is leading to a global data revolution in manufacturing, which provides an unprecedented opportunity for the manufacturing industry to march towards a new generation of digitalization and intelligence. In this project, we will develop an integrated sensing, machine learning, and HPC framework for next-generation manufacturing process control. New deep learning algorithms such as convolutional neural network (CNN) and residual neural network (ResNet) will be developed for decision-making tasks such as machine health condition monitoring and quality prediction. These algorithms will be implemented to ultrasonic metal welding, which is an important solid-state joining technology with widespread industrial applications. We will implement the algorithms using graphics processing units (GPUs) and HPC to pursue real-time decision-making.
Student contributions
The REU student will use TensorFlow or PyTorch to develop machine learning algorithms (e.g., CNN, ResNet) and train, validate, and test the algorithms using real-world sensing data collected from ultrasonic metal welding. Then the student will conduct benchmark analysis to evaluate the performance of the developed algorithms and inferencing predictions in real-time production.
Project 6: Spatial Analysis of Tumor Heterogeneity using Machine Learning Techniques
Co-mentors
Dr. Zeynep Madak-Erdogan
Food Science & Human Nutrition
Dr. Aiman Soliman
NCSA
Project Description
Tumor heterogeneity is an inherent feature of all tumors that drive resistance to therapies. With the advent of new approaches integrating single-cell sequencing with spatial tumor data, interdisciplinary teams can better understand local changes in tumor metabolism, biology, as well as different cell populations that affect different aspects of immune responses, drug resistance, and metastasis. In this project, we will leverage spatial data analysis tools from Geospatial Science to quantify tumor microenvironment spatial heterogeneity. Machine learning techniques will be utilized to prepare image data for the ensuing integration of location and sequencing data.
Student contributions
We are looking for a student with experience in basic programming in Python and experience with R or Scikit learn library and classical machine learning methods (e.g., classification and clustering); experience with Tensorflow/Keras is a plus. The student will work on implementing spatial indices to quantify the tumors’ 2D heterogeneity and training and evaluating classical machine learning and deep learning models to connect the spatial indices and higher dimension genetic markers data.
Project 7: Physics-informed Machine Learning: a pathway for explainable and efficient AI
Co-mentors
Bruno Abreu
NCSA
Matthew Krafczyk
NCSA
Project Description:
As we move further into the Big Data era, machine learning and artificial intelligence methods are becoming more and more critical tools to advance scientific knowledge that foster groundbreaking technological applications. Processing, modeling, and understanding large amounts of data requires enormous computational power and, fundamentally, new algorithms and theoretical approaches that can flexibly incorporate domain science knowledge. In this project, we are investigating how to improve our ability to more effectively incorporate such domain knowledge when developing and training novel machine learning models.
Student Contributions:
In this project, the student will work with a proof-of-concept to show that this incorporation process can lead to more efficient, more accurate, and/or more explainable models. The successful applicant will use DRYML, a new framework that encapsulates infrastructure and enables machine learning practitioners to focus on their models. Students will also contribute to improving DRYML's usability and flexibility.
Project 8: Deep Sea Video Classification for Ecological Conservation
Co-mentors
Matthew Krafczyk
NCSA
Aiman Soliman
NCSA
Project Description:
Help make an impact on ecological conservation by improving the performance of image and video classification models of conservation data. Four-fifths of the vast underwater realm remain unexplored and only about 5% of the seafloor has been mapped at high resolution. One method that we are currently using to explore these mysterious habitats is deep-sea camera traps. However, the problem is the amount of footage that we are presently generating by these cameras far exceeds our ability to review the footage effectively, and while scientists can spot many animals and their behaviors, computer vision methods that use machine learning can likely detect more.
Student Contributions:
The successful applicant will learn about and improve existing image/video classification models and use the DRYML framework to ease hyperparameter search and distributed training. DRYML is a new open-source Machine Learning Meta-Library that enables practitioners to focus more on their model and less on infrastructure. Students will likely make contributions to DRYML to improve its usability and flexibility.
Project 9: Use of Neural Networks for Induced Earthquake Modeling
Co-mentors
Roman Y. Makhnenko
Civil & Environmental Engineering
Alex Tartakovsky
Civil & Environmental Engineering
Project Description:
Many industrial activities (e.g., wastewater disposal, operation of enhanced geothermal systems, geologic carbon storage, and hydraulic fracturing) involve injection of fluid into the subsurface. Multiple physical processes, such as heat and fluid transport, as well as mechanical deformation of rock, might result in creating conditions favorable for earthquakes. These processes usually affect each other and, therefore, are called coupled processes. Analytical solutions describing these complex phenomena do not exist and advanced numerical models are utilized to assess the risks of creating an earthquake during the injection of fluid into subsurface. Most of the existing models remain quasistatic (all changes in the system assumed to be slow) without consideration of dynamic rupturing process (fast changes in system during fracture propagation), therefore predicting the evolution of the system toward the state favorable for failure rather than failure process itself. High-resolution models are computationally intensive, forcing to simplify the model to achieve results within reasonable time. This simplification is usually conducted having in mind the expected dominant effect and potential bias toward certain triggering mechanism might be introduced during the simplification. Moreover, manual analysis of the numerical modeling results is focused on particular locations and mechanisms and might be incapable of considering a “big picture”. The proposed project will deal with machine learning approaches to efficiently recognize hidden patterns in large seismic data sets collected in the Illinois Basin.
Student Contributions:
We are looking for students with Matlab and Python programming skills; machine learning and data science is highly desirable, knowledge of geomechanics and geophysics will be helpful. Use of neural networks is preferable over arbitrary data interpretation to unbiasedly determine the physical mechanisms responsible for the observed earthquakes during subsurface fluid injections. This will enable the implication of high-resolution numerical models that can properly describe the physical processes behind rock failure at different scales. The efficient interpretation of large seismic data sets and prediction of induced earthquake preparatory processes will promote the safe use of underground for renewable energy extraction and storage.
Project 10: Workflow tradeoffs in the context of cancer phylogeny
Co-mentors
Daniel S. Katz
NCSA
Matthew Barry
NCSA
Description:
The NCSA PhyloFlow project is building workflows to perform phylogenetic tree computations to understand cancerous tumor evolution. This work is intended to help researchers understand and track tumor evolution and to enable doctors to develop more personalized cancer treatment plans. These workflows involve execution of multiple dependent tasks, stored in Docker containers. There are many ways to build such workflows, including workflow definition languages like WDL and CWL, which then require runners, or programming systems such as Parsl, which just use their language's runtime, Python in the case of Parsl.
Student Work:
Students will implement one or more of the same workflows in WDL, CWL, and Parsl, and compare them in terms of performance, usability, programmability, reproducibility, etc. They will also explore mixed implementations, such as using Parsl to execute workflows with components that are defined using WDL and CWL. This work will lead to a report and poster, and could lead to contributions to the open source Parsl code and/or papers/presentations in conferences.
View Past Projects