Research Projects

Project 1: Reproducible Best Practices for Development of Scientific Simulation Software

Co-mentors

Kathryn Huff

Kathryn Huff
Nuclear, Plasma, & Radiological Engineering

Matthew Turk

Matthew Turk
Astronomy

Social impact

Developing safe, and sustainable nuclear energy as a fundamental part of a clean, carbon-free, worldwide future energy.

Project description

Safe, sustainable nuclear energy will be a fundamental part of a clean, carbon-free, worldwide energy future. The Advanced Reactors and Fuel Cycles group (Huff) and the Data Exploration Laboratory (Turk) are collaborating to develop reproducible, open-source software (OSS) for simulation and analysis of phenomena in advanced nuclear reactor designs. Open source physics kernels and applications will be developed within the Multiphysics Object-Oriented Simulation Environment (MOOSE) Finite Element Modeling ecosystem. These kernels and simulations will extend current modeling capabilities to include physics appropriate for the unique phenomena encountered in advanced nuclear reactors. As much as possible, this work will be conducted in a transparent and open manner, with an emphasis on maximizing reproducibility and reuse potential. Accordingly, this work will leverage literate programming tools (e.g., Jupyter notebooks) as a platform for communicating analysis methods and results.

The undergraduates will work as part of a team ensuring the reproducibility of the kernels under development. Their tasks will familiarize them with scientific software development best practices such as pair programming, unit testing, automated documentation, and reproducible workflows. Specifically, the students will pair program alongside the faculty mentors, postdoctoral scholars, and one another to implement C++ unit tests using the MOOSE testing framework. As their familiarity with the project grows, they will engage in iterating upon repeatable validation and verification demonstrations of the simulation capabilities by using Jupyter notebooks to package and communicate simulation workflows. Meanwhile, they will be guided to enriching documentation for the methods as needed, using the Doxygen automated documentation framework.

Project 2: Simulating Momentary Time Sampling for Classroom Observations of Students Affective States

Co-mentors

Luc Paquette

Luc Paquette
Curriculum & Instruction

Jinming Zhang

Jinming Zhang
Educational Psychology

Social impact

Education that better meet student needs, via experiment-driven data.

Project description

In this project, students will develop a tool to study momentary time sampling (MTS) of affective (emotional) states in a simulated classroom and to test the effect of different parameters on the accuracy of such a sampling approach. MTS is one of the preferred methods for discontinuous sampling in social sciences, but it is known to sometimes produce a high measurement error based on several factors related to the design of the observation session (e.g., the sampling interval, length of the observation session) and the properties of the behavior or construct being studied (e.g., interactions between the duration of individual behavioral events, as well as the number of instances and the overall prevalence of each behavior during the observation session). The goal of this project is to develop a tool allowing educational researchers to better plan their classroom data collection in order to improve the accuracy of the collected data and increase the statistical power of their experiments.

Students participating in this project will be responsible for the implementation and the design of the MTS tool. Their design will be informed by a previous prototype that was developed with limited functionality. As the goal of this project is to develop a tool that can be used by educational researchers, the resulting product will need to be multi-platform, have a user-friendly interface, and run efficiently on a regular laptop computer. Paquette will overview the development of the tool and will help the students specify its design based on the needs from the research community. Zhang will contribute to the project by offering insights on how to generate random distributions when simulating students in a classroom and on how to evaluate the accuracy of the simulated observations.

Project 3: Quantification of the Potential for Epistatic Genomic Loci to Improve Maize Yield

Co-mentors

Liudmila Mainzer

Liudmila Mainzer
Institute for Genomic Biology

Alexander Lipka

Alexander Lipka
Crop Sciences

Social impact

Improved crop yields, including corn, through genetic assessment.

Project description

The biological objective of this project is to determine the extent to which epistasis between pairs of genomic loci contribute to variation of yield in maize. The motivation for this research is that it is anticipated that there will not be enough food for the world population by 2050. Thus, there is a critical need to improve the yields of crops including maize. Indeed, critically assessing the genetic sources contributing to maize yield is an essential first step for improving maize yield. Assessing these genetic sources will enable breeders to focus their resources on specific genes (and/or combinations of genes) to select to maximize yield. This project will assess three research questions: 1) How much of yield is explained by epistasis? 2) What are the effect sizes of epistatic loci? And 3) Will considering the identified epistatic loci result in substantially increased yields?

A pair of students, one with a more computational background and the other with a more biological background, will conduct this analysis. These students will conduct stepwise epistatic model selection in the maize nested association mapping panel. The traits to be analyzed will be yield and other related traits. The stepwise epistatic model selection program (which is part of our local version of TASSEL5) was developed by a team lead by Mainzer and Lipka. Since high core-count servers are best for this work, we will use the 46-core Dell system available at NCSA's Innovative Systems Lab. The dual-threaded cores provide ability to parallelize computation up to 96 threads, which would significantly speed up this analysis. Mainzer and Lipka will supervise this project and ensure that the analysis is completed in a timely manner. To facilitate communication between the students and supervisors, the students will work and have weekly meetings at NCSA. The students will also attend the biweekly Lipka Lab meetings and the HPCBio group meetings. These meetings will give the students the opportunity to ask "big-picture" questions, identify any bottlenecks with the conducting the analysis, and for Mainzer and Lipka to monitor the students' progress. This project should provide insight into the contribution of epistasis towards variation in yield in maize, and identification of and solutions for computational bottlenecks for running stepwise model selection. In addition, the computational student will learn more about biology, and the biology student will learn more about computational aspects. Both students will learn about conducting statistical analyses and participating in interdisciplinary collaborations.

Project 4: Evaluating Application Suitability for a Novel Migratory Near Memory Processing Architecture

Co-mentors

Volodymyr Kindratenko

Volodymyr Kindratenko
Electrical and Computer Engineering

William Gropp

William Gropp
Computer Science

Social impact

Solving larger Big Data problems more quickly, through a new computing platform.

Project description

In this project, students will gain first-hand experience with a cutting-edge computer architecture, Migratory Near Memory Processing Architecture, currently being developed by Emu Technology. This is new technology that could revolutionize several application areas involving big data analytics, many of which have potential social benefits to broad communities. Conventional HPC-scale distributed memory computers are designed with the assumption that the large majority of memory access operations will be to local memory. However, for truly large datasets with complex data access patterns, this is not the case and accessing data across many memory systems becomes necessary to carry out the required computations. Emu's solution to this problem, referred to as Migratory Memory-Side Processing, is to move the execution context to the data rather than moving large amounts of data to the computational thread. This approach is promising for a number of applications, including both numerical and data-intensive codes.

Emu Technology will provide remote access to a prototype system programmable in Cilk, a parallel language based on C. Students will receive appropriate training in both Cilk programming language and the software development tools and methodologies for Emu's system. Working with mentors, they will identify 2 to 4 kernels with distinctly different computational workloads and data access patterns and will re-implement them in Cilk for execution on both traditional shared memory systems and on Emu’s novel architecture. Students will carry out the code implementation, performance measurements, and analysis of the results, culminating in a white paper that will compare and contrast applications running on the traditional shared memory architecture and the Migratory Near Memory Processing architecture. The students will work closely with other students at the Innovative Systems Lab (ISL) at NCSA where they will be exposed to other research projects involving novel computer architectures. They will meet with both PIs on a weekly basis to review the results and receive feedback for their work.

Project 5: Computational Materials Science: Visualization and Machine Learning

Co-mentors

Andre Schleife

Andre Schleife
Materials Science and Engineering

Andrew Ferguson

Andrew Ferguson
Materials Science and Engineering

Social impact

Provide sophisticated yet intuitive and user-friendly visualization for effective materials data analysis and data dissemination to a broad scientific audience and the general public.

Project description

Modern computational materials science produces large amounts of static and time-dependent data that is rich in information. Examples include atomic geometries of complex biomolecules, condensed-matter crystals, and electron-density probability distributions. Extracting the relevant information from these data to determine the important processes and mechanisms constitutes an important scientific challenge. The availability of sophisticated yet intuitive visualization is a crucial component of effective data analysis, and is vital in disseminating results to a broad scientific audience and the general public.

In this project we use and develop physics-based ray-tracing and stereoscopic rendering techniques to visualize the structure of existing and novel materials e.g., for solar-energy harvesting, optoelectronic applications, and focused-ion beam technology. We will couple these visualization tools with Maxwell solvers and supervised machine learning algorithms to perform targeted discovery and rational design of new materials with tailored optical properties. The team will establish a powerful and intuitive platform for visualization of atomic geometries, optical reflection and transmission spectra, and time-dependent electronic excitations. This platform will allow for guided design of next-generation optical materials for use in novel lenses or energy-saving window coatings.

Working towards this goal, students will analyze and visualize atomic geometries and electron densities from first-principles simulations of excited electronic states using density functional theory (DFT) and time-dependent DFT. They will develop an open-source tool that interfaces an external Maxwell solver with the scikit-learn Python-based machine-learning library to perform supervised machine learning and guided materials design and discovery. Students will also develop codes based on the open-source ray-tracer Blender/LuxRender and the open-source yt framework to produce image files and movies from these data. Stereoscopic images will be produced that can be visualized using e.g. Google Cardboard or other virtual-reality viewers. Examples for possible outcomes can be seen here.

Project 6: Machine Learning to Estimate Crop Productivity from Multi-Sensor Fused Satellite Data

Co-mentors

Kaiyu Guan

Kaiyu Guan
Natural Resources & Environmental Sciences

Jian Peng

Jian Peng
Computer Science

Social impact

Developing a reliable and time-lead forecasting systems for crop type and crop yield for farm communities and government agencies use.

Project description

Reliable and time-lead forecasting systems for crop type and crop yield has critical values for various purposes for farmer communities and government agencies. Field-level estimation of crop yield is particularly useful for understanding how crop productivity responses to various management requirements and environmental factors. With continued climate variability (e.g., 2012 Midwest drought) and the ongoing climate change, farmer community and our government require better information to monitor crop growth and their near-term prospects. As the most important staple food production area, the U.S. Corn Belt produces half of the global corn and soybean combined, and has significant importance for regional, national, and global economy and food security. However, given the great needs for such forecasting information, we do not have a forecasting system for the U.S. Corn Belt for public use. This project will develop scalable machine-learning methods to integrate data from satellite remote sensing and other auxiliary information to make accurate yet cost-effective predictions of crop type. First, we will generate a 30-meter (2000-present; 10-meter for post-2014 period), daily, cloud-free data stack for the three major Corn Belt States, Illinois, Iowa, Indiana, by integrating three major satellite datasets (Landsat, MODIS, and new Sentinel-2). We will build upon this data stack to develop a machine-learning forecasting system for crop type, up to a lead time of 1-to-2 months before the harvest time with a particular focus on rain-fed corn and soybean. We will fully leverage satellite information, including spectral, phenological, and field-level texture information, for our analytics to achieve field-level predictions with advanced deep machine learning approaches.

In this project, the REU students will implement a computational pipeline that extract real-time satellite images, deposit data into a database and develop data fusion and machine learning components for predictive tasks. Due to the enormity of the satellite data and the highly demanding computational requirements, the students will need to distribute storage- and computation-intensive modules on the Blue Waters supercomputer. The students will have meetings with both Guan and Peng on a regular basis, participate the group meetings and reading groups organized in their research labs, and interact with graduate students. The students will learn advanced knowledge in remote sensing and machine learning and improve scientific research skills.

Project 7: Understanding Types of Research Problems from a Replication Perspective

Co-mentors

Daniel Katz

Daniel Katz
NCSA

Victoria Stodden

Victoria Stodden
iSchool

Social impact

Identify the software capabilities needed to enhance reproducibility of computational findings to restore public confidence in research community and scientific discoveries.

Project description

Replication is a growing concern in modern research, as an increasing number of papers and results each year are shown to be irreproducible or are withdrawn. When these are publicized in the research community, the community can respond to fix the specific problems (reasonably easily) or the systematic problems (much harder, but being attempted). However, when the publicity stretches into the general press, this lead the public towards skepticism of all science and all research, which is harder to recover from. If we can move towards science being more generally and more automatically reproducible, we can avoid this loss of public confidence. We can also work towards democratization of science, where more people being able to start with existing findings and ask new questions means more and better results. And we can increase general human well-being by recognizing that a small amount of current research, including in health and all other fields, results in incorrect conclusions, due to bugs, errors, etc., and by checking all research, we could find these errors more easily, giving us more faith in the research that has been reproduced.

In this project, we will investigate the role of software in the scientific discovery process by exposing gaps in the verifiability of published computational results, based on Stodden's work on reproducibility. We plan to understand the software capabilities needed to enhance the reproducibility of computational findings, including workflow and automated process capture, the role of software citation in addressing incentives, and best practicing in documentation and development. Students will attempt to reproduce published computational studies to understand these gaps and solutions that are needed to facilitate reproducibility in computational science, understand why this is or is not possible, and try to design solutions. The faculty and students will use these experiences to classify different types of workflows with respect to their reproducibility, and the students will create a website to demonstrate this to the public. This project will also expose the students to a wide variety of computational tools and develop a broad base of skills in computational science research.

Project 8: Eye Tracking Race and Cultural Difference in Video Games

Co-mentors

Ben Grosser

Ben Grosser
School of Art & Design

Jodi Byrd

Jodi Byrd
English

Social impact

Identifying the effect of video games' designs on race and cultural differences perceptions, critical thinking, and split-second decision-making.

Project description

How do players of contemporary video games perceive and process race-based information as part of their gaming experience? What role does that perception play in split-second decision-making processes in terms of character combat, map navigation, and attitudes towards cultural information throughout game environments? This project at the NCSA Critical Technology Studies Laboratory will combine eye tracking of visual attention during game play, data analysis of that attention, and visualization of the results towards a critical understanding of race and cultural difference in video games. We will use the Irrational Games' 2013 AAA title, BioShock Infinite, to examine the effects of both active (e.g., visible race of characters and combatants in the game) and passive (e.g., racial background materials embedded in environments, level designs, and other artistic content) processing of race-based visual data. The game Never Alone—a collaboratively designed game with input from Alaskan Native elders that includes activities and artifacts intended to foreground indigenous cultural difference—will be used to explore the effectiveness of such an approach for creating cultural awareness and learning among gamers. The results of this research will lead to new insights about how the designs of video games affect gamer attitudes towards race and difference, and will suggest new approaches for future game designers aiming to positively affect such attitudes in the future.

We will employ commodity eye tracking hardware and software to capture gamer attention during gameplay. The data produced by these systems requires significant post-processing to understand. Furthermore, such systems aren't designed to temporally synchronize that data with active gaming experiences. Therefore, our students will develop software to process the eye tracking data, synchronize it over time with a video capture of the gameplay, and visualize aspects of the data on top of that video capture. Such visualizations, especially when comparing data from multiple subjects, will illustrate the relationships between race-based content in video games and player perceptions and actions. The software developed will be hosted on the Critical Technology Laboratory's GitHub account, made available via the NCSA Open Source License, and offered to the critical gaming academic community for use and comment. The students will work with Grosser and Byrd on all aspects of the project from critical understandings of race and race-based imagery in video games to consideration of how design of gaming interfaces affects user attention. Outcomes will include data visualization and software development, and the application of this research will include the analysis software itself, visualizations of results, publications of findings, and art exhibitions derived from the research.

Project 9: Practical Process Topology Assignment

Co-mentors

William Gropp

William Gropp
Computer Science

Marc Snir

Marc Snir
Computer Science

Social impact

Improve performance of critical scientific codes running on the nation's largest HPC systems.

Project description

Some of the most challenging problems facing society today require the fastest available computers. Such problems include everything from understanding climate change to modeling the actions of diseases. A critical barrier to successful simulations is the scalability of massively parallel simulation codes, and one element of that is the effective placement of the computing processes on a massively parallel machine. The mapping of processes to cores and nodes in a massively parallel computer can have significant impact on the performance of tightly-coupled applications as several studies have shown. In addition, the dominant programming model for these applications, MPI, provides an interface to allow applications to provide information about how the processes communicate; this interface was intended to allow MPI implementations to provide an appropriate mapping of processes to the nodes and cores. However, no implementation of MPI provides a useful implementation of this feature, in part because a perfect solution is NP complete. The challenge is to develop practical approximate solutions that match common application needs.

In this project, students will develop both benchmarks to measure the impact of different process assignments and tools to provide good (if not optimal) process assignments for common communication patterns. Several approaches will be considered, including bandwidth reduction permutations for the matrix that represents the communication graph, as proposed by Snir and Hoefler, as well as exploring hierarchical approaches that consider the assignment to nodes, chips, and cores separately. The project will result in better implementations for the MPI process topology routines that can be integrated into existing open source implementations such as MPICH and Open MPI, as well as research papers detailing the approach and results. The work will take advantage of the Blue Waters system for tests involving measurements of the impact of process topology assignment on scalability, and work with the Center for the Extreme Scale Simulation of Plasma Coupled Combustion to work with these methods in a large-scale complex application.

Project 10: Data Storage and Analysis Framework for Semiconductor Nanocrystals Used in Bioimaging

Mentor

Andre Schleife

Andre Schleife
Materials Science and Engineering

Social impact

By exploring systematic exchange of data and workflows, this project provides insights and best practices for the future of collaborative research. Providing access to sample specific data and analysis to the international community will accelerate deployment of novel semiconductor nanocrystals for bioimaging.

Project description

Light-emitting molecules are a central technology in biology and medicine that provide the ability to optically tag proteins and nucleic acids that mediate human disease. In particular, fluorescent dyes are a key part of molecular diagnostics and optical imaging reagents. We recently made major breakthroughs in engineering fluorescent semiconductor nanocrystals to increase the number of distinct molecules that can be accurately measured, far beyond what is possible with such organic dye molecules. We aim to develop nanocrystals that are able to distinguish diseased from healthy tissue and determine how the complex genetics underlying cancer respond to therapy, using measurement techniques and microscopes that are already widely accessible.

In order to achieve this goal, we need to understand a complex design space, that includes size, shape, composition, and internal structure of the different nanocrystals. Students in this team will work with computational and experimental researchers in several departments in order to establish a database to store, share, and catalog optical properties and other relevant data describing semiconductor nanocrystals. This requires developing schemas and analysis workflows that can be efficiently shared between multiple researchers. Eventually, both the data and the workflows will be made available to the general public.

Students will first identify all information that will need to be included in this catalogue. Students will then write JSON and python code and interface with Globus and the Materials Data Facility. They will create well-documented iPython notebooks that operate directly on the Globus file structure and run in the web browser. Students will also develop code that automatically analyzes data stored in the facility, e.g. to verify and validate experimental and computational results against each other. This project is highly interdisciplinary and students will work with a team of researchers in bioengineering, materials science, mechanical engineering, and NCSA.