Current Projects

Project 1: Deep Learning to Enable Multi-messenger Astrophysics Discovery in the LSST Era

Co-mentors

Eliu Huerta

Dr. Eliu Huerta
Argonne National Laboratory

Zhizhen-Zhao

Dr. Zhizhen Jane Zhao
Department of Electrical and Computer Engineering

Project Description

This project will focus on the design of DL algorithms for the detection and parameter estimation of gravitational wave sources, and on the identification of their electromagnetic and gravitational emission. We will build upon our work in these research areas, and will leverage NCSA’s role as the main data hub for the Dark Energy Survey and the Legacy Survey of Space and Time to create DL tools to explore the processing and identification of astrophysical sources that may be observed concurrently in the gravitational and electromagnetic spectra.

Student contributions

The REU student will learn several skills to contribute to this project, including the use of supercomputers to curate the datasets to train, validate and test neural networks using GPU-based systems at NCSA, Argonne National Lab and Oak Ridge National Lab. The student will become acquainted with open source platforms for DL research, including TensorFlow and PyTorch, as well as the use of distributed learning solutions such as Horovord. The student will learn to process and correctly interpret the predictions of these deep learning models, as well as to use scientific visualizations to understand how these models abstract knowledge from data, and what sectors of the deep learning model are involved in the prediction of results.

Project 2: Comparing Deep Learning and Expert Knowledge for Sequential Pattern Mining

Co-mentors

Dr. Nigel Bosch

Dr. Nigel Bosch
School of Information Sciences

Luc Paquette

Dr. Luc Paquette
Education

Project Description

This project will explore recent methods for learning sequential patterns of behavior. Specifically, we will train convolutional neural networks to learn predictive sequences of students’ behaviors over time as the students interact with educational software. We will then compare the patterns that neural networks learn to patterns defined by experts to identify similarities and differences between what the convolutional filters learn and what experts believe is important. Finally, we will train machine learning models using these patterns as features to predict whether or not students are going to engage in “gaming the system” behaviors, where they attempt to skip through assignments as quickly as the software will allow without effortful learning.

Student contributions

The REU students will take advantage of recent advances in neural network model interpretability to enable comparisons between expert hypotheses and data-driven discovery of important sequential behavior patterns. These findings will also contribute to understanding of the tradeoffs between maximizing model accuracy and providing semantically meaningful insights. Students will learn how to use deep learning models to implement these analyses, as well as data processing and visualization methods that are needed to evaluate the results.

Project 3: Machine Learning for Genomics

Co-mentors

Dr. Liudmila S. Mainzer
University of Wyoming’s Advanced Research Computing Center (ARCC)

Dr. Christopher Fields
UIUC HPC Biological Computing

Project Description

NCSA Genomics team participates in the Consortium of Human Health and Heredity in Africa (H3A) as one of the bioinformatics nodes in the H3ABionet2.0, to build advanced computational genomics analyses that serve the health interests of people in Africa. ML is one of the projects in the Tools and WebServices Work Package that is being developed within that program. We are surveying current ML/DL approaches, adapting them to the analysis needs in Africa, and developing new ones that are appropriate for the use cases driven by the predominant local biomedical needs, such as tackling infectious diseases, psychiatric conditions and AIDS.

Student contributions

The REU student will work closely in a long-distance collaboration with H3A scientists to port analyses to the advanced ML infrastructure at NCSA, help identify computational performance considerations that stem from the nature and size of data that are being analyzed and run those analyses on data from African collaborators. Applications include but are not limited to 1) determining specificity of protein binding, 2) predicting protein function, 3) analyzing gene expression patterns, 4) predicting transcription factor binding sites and DNA methylation states.

Project 4: Estimation of Crop Productivity from Multi-sensor Fused Satellite Data

Co-mentors

Kaiyu Guan

Dr. Kaiyu Guan
Natural Resources & Environmental Sciences

Jian Peng

Dr. Jian Peng
Computer Science

Project Description

Reliable and time-lead forecasting systems for crop type and crop yield has critical values for various purposes for farmer communities and government agencies. Field-level estimation of crop yield is particularly useful for understanding how crop productivity responses to various management requirements and environmental factors. This project aims to develop scalable ML methods to integrate data from satellite remote sensing and other auxiliary information to make accurate yet cost-effective predictions of crop type. We are now working on generating a 30-meter (2000-present), daily, cloud-free data stack for the three major Corn Belt States, Illinois, Iowa, Indiana, by integrating three major satellite datasets (Landsat, MODIS, and new Sentinel-2).

Student contributions

The REU student will contribute to the implementation of a computational pipeline that extracts real-time satellite images, deposits data into a database and develops data fusion and ML components for predictive tasks. The student will be utilizing storage and run computation-intensive codes on the Blue Waters supercomputer. The student will have meetings with the mentors on a regular basis, participate in group meetings, and interact with graduate students.

Project 5: Development of Data-driven Machine Learning-Based Food Crises Prediction Model

Co-mentors

Dr. Hope Michelson
Agricultural & Consumer Economics

Dr. Aiman Soliman
National Center for Supercomputing Applications

Project Description

Methods currently in use to predict food crises have limitations that delay and impede humanitarian response: they are not model-driven, and they do not engage the full scope of available data. An effective early warning system is urgent, given the expectation that climate shocks disrupting agricultural production and market functioning will increase in frequency and severity in coming decades. This project develops and deploys a new model-driven method for predicting food crises across the world. We are working towards developing automated, real-time, sub-national food security prediction in developing countries.

Student contributions

The REU student will help with developing and analyzing new data sources for predicting food crises and will apply DL techniques to the prediction problem. We will employ publicly available data at high spatial granularity and high frequency, allowing rapid, real-time assessment of sub-national food security. The student will help to develop code to integrate multiple data source and will work on the DL-based prediction model utilizing these data.

Project 6: Gr-ResQ: Data-driven Approaches for Accelerating Synthesis of 2D Materials

Co-mentors

Dr. Elif Ertekin
Mechanical Science & Engineering

Dr. Sameh Tawfick
Mechanical Science & Engineering

Project Description

The major goal of this project is to advance the state-of-the-art in manufacturing and synthesis by combining machine learning and experimentally validated data. While there exists a tremendous amount of academic and industrial research in synthesis of nano materials such as graphene, or in 3D printing, advancement in these fields currently utilize expensive or tedious trial-and-error experimentation. Utilizing a combination of crowd-sourced and locally derived process parameter data, we will be searching for better procedures for more controllable and repeatable manufacturing.

Student contributions

The REU students will apply ML on experimental data from Gr-ResQ, a database of chemical vapor deposition synthesis recipes, and from 3D printing projects to develop ML models to accelerate the discovery of large-scale graphene production. Students will have to identify and extract relevant features from the dataset of images, Raman spectra, and associated recipes, and then develop a model using traditional ML or DL techniques to predict potentially successful graphene recipes. Students will work closely with experimental collaborators to test their predictions and provide data to feed back into their model.

Project 7: An Integrated Sensing, Machine Learning, and High-performance Computing Framework for Real-time Decision-making in Smart Manufacturing

Co-mentors

Dr. Chenhui Shao
The Grainger College of Engineering

Dr. Seid Koric
Mechanical Science & Engineering

Project Description

The recent development of sensing, communication, and computing technologies and infrastructure is leading to a global data revolution in manufacturing, which provides an unprecedented opportunity for the manufacturing industry to march towards a new generation of digitalization and intelligence. In this project, we will develop an integrated sensing, machine learning, and HPC framework for next-generation manufacturing process control. New deep learning algorithms such as convolutional neural network (CNN) and residual neural network (ResNet) will be developed for decision-making tasks such as machine health condition monitoring and quality prediction. These algorithms will be implemented to ultrasonic metal welding, which is an important solid-state joining technology with widespread industrial applications. We will implement the algorithms using graphics processing units (GPUs) and HPC to pursue real-time decision-making.

Student contributions

The REU student will use TensorFlow or PyTorch to develop machine learning algorithms (e.g., CNN, ResNet) and train, validate, and test the algorithms using real-world sensing data collected from ultrasonic metal welding. Then the student will conduct benchmark analysis to evaluate the performance of the developed algorithms and inferencing predictions in real-time production.

Project 8: Design of New Materials through Machine Learning

Co-mentors

Dr. Andre Schleife
Materials Science & Engineering

Dr. Michael Ondrejcek
NCSA

Project Description

The accurate description of excited electronic states is a very promising goal in computational materials science with significant impact on applications including photovoltaics, bioimaging, and optical materials. However, achieving accurate results requires heavy use of computation, limiting materials design. At the same time, computational materials science is benefiting from the data revolution, with both experimental and computational databases becoming more and more prevalent. In this project, we aim at mitigating the high computational cost of studying electronic excitations involved in optical properties by exploring use of ML and incorporation of materials databases.

Student contributions

The REU student will use atomistic simulations and Maxwell modeling techniques to accurately describe nano- and meso-structured materials. Based on preliminary work, we will further combine these simulations with experimental data for semiconductor nanocrystals, that we gather using a previously established web-based framework. Newly and previously produced data will be used to train ML models on the machine-learning optimized HAL computer at NCSA, to either predict optical spectra for materials, or energy and width of prominent spectral features. Inverting the ML model will be used to facilitate design of materials with desirable optical properties.

Project 9: Underpinnings of Racial Health Disparities

Co-mentors

Dr. Zeynep Madak-Erdogan
Food Science & Human Nutrition

Dr. Liudmila Mainzer
NCSA

Project Description

Health disparities, be it racial, economic, rural-urban, gender- or age-based, have come to the forefront across the world. To elucidate the biological, social, economic and psychological mechanisms of health disparities, and to develop interventions that engage community in targeting these mechanisms to reduce health disparities, it is necessary to work with complex multidimensional datasets containing molecular, genetic and biometric information from individuals, plus their socioeconomic status, local environment/safety, degree of segregation, access to medical care/education, and levels of pollution. We are developing novel statistical and ML approaches to harmonize these heterogeneous data and detect important contributors to health disparities. We are aiming to develop predictive tools to identify populations at-risk for poor health outcomes, in order to help community services, reach out and bring in those individuals for treatment earlier.

Student contributions

The REU student will work with NCSA computational scientists and faculty collaborators in areas of women’s health and infectious diseases, as well as the representatives of the public health district, to gather, prepare and analyze health-related data, and develop novel statistical and ML approaches.

Project 10: Spatial Analysis of Tumor Heterogeneity using Machine Learning Techniques

Co-mentors

Dr. Zeynep Madak-Erdogan
Food Science & Human Nutrition

Dr. Aiman Soliman
National Center for Supercomputing Applications

Project Description

Tumor heterogeneity is an inherent feature of all tumors that drive resistance to therapies. With the advent of new approaches integrating single-cell sequencing with spatial tumor data, interdisciplinary teams can better understand local changes in tumor metabolism, biology, as well as different cell populations that affect different aspects of immune responses, drug resistance, and metastasis. In this project, we will leverage spatial data analysis tools from Geospatial Science to quantify tumor microenvironment spatial heterogeneity. Machine learning techniques will be utilized to prepare image data for the ensuing integration of location and sequencing data.

Student contributions

We are looking for a student with experience in basic programming in Python and experience with R or Scikit learn library and classical machine learning methods (e.g., classification and clustering); experience with Tensorflow/Keras is a plus. The student will work on implementing spatial indices to quantify the tumors’ 2D heterogeneity and training and evaluating classical machine learning and deep learning models to connect the spatial indices and higher dimension genetic markers data.