Research Projects

Project 1: Visualizing and Preserving Environmental Data for Improved Governance

Co-mentors

Anita Say Chan

Anita Say Chan
Media and Cinema Studies

Ben Grosser

Ben Grosser
School of Art & Design

Social impact

Supporting current environmental data justice initiatives that hold governments and companies accountable for the environmental damage their policies and actions cause, and attend to how these oversights impact marginalized communities disproportionately.

Project description

As more and more revisions are made to data and scientific analyses available on government websites concerning environmental and climate protection, there has been a growing need for researchers and coders to preserve environmental data and keep citizens informed of such changes. This position will assist the Environmental Data and Governance Initiative (EDGI), a network of scholars and researchers that archives federal environmental data to safeguard it against potential reductions in access by the current administration, develops online tools to support monitoring changes to federal environmental websites, and tracks cuts in funding, research, and regulation at environmentally oriented agencies. These agencies and departments include, but are not limited to, the EPA (Environmental Protection Agency), NOAA (National Oceanic and Atmospheric Administration), NASA (National Aeronautics and Space Administration), USGS (United State Geological Survey), OSHA (Occupational Safety and Health Administration), DOE (Department of Energy) and BLM (Bureau of Land Management).

This position will support collaborations under EDGI's public data working group that include projects for indexing millions of government web pages on a weekly basis, tracking changes on them, and producing regular reports. Additional ongoing efforts include distributed protocol development for data storage, machine learning work that can isolate the most important website changes for enhanced tracking efforts, and security advancements for privacy protection of EDGI volunteers and workshop participants engaging in data preservation and website monitoring. Potential project work could also extend developments made under EDGI's Google Summer of Code partnership, where recent collaborations utilized machine learning algorithms to identify and monitor changes on government agency websites using data from multiple sources: Versionista, PageFreezer, and Internet Archive; another recent collaboration used D3 to develop DataRescue Maps as impactful, publically-meaningful models to allow users to easily visualize changes to government websites archived by EDGI. The data being archived is vital for environmental research and protection, but it can be meaningless or overwhelming in the hands of users without clear graphs or interactive models that help provide context and a general overview of the data.

This project welcomes applicants with an interest in environmental data analysis and preservation and other interdisciplinary skills: including Spanish translation, experience in data visualization and coding (Python, Ruby on Rails, or JavaScript in particular); interest in or experience working with data and databases, web crawling, API work, machine learning, open science, and community organizing.

Project 2: Optimization of Open-Source Software for Deep Learning

Co-mentors

Volodymyr Kindratenko

Volodymyr Kindratenko
Electrical and Computer Engineering

William Gropp

William Gropp
Computer Science

Social impact

Contributing to the advancement of machine learning, which is at the core of many modern approaches to solve real-world problems in the fields ranging from education to healthcare to engineering to core sciences.

Project description

Deep neural networks are at the core of artificial intelligence, machine learning, computer vision, and other advanced applications across many disciplines. Such networks allow computers to "learn and infer" rather than "compute," which is essential for many problems in which models that describe the data are multi-dimensional, non-linear, and generally are too complex for traditional mathematical techniques.  Many deep learning frameworks have been developed over the course of past decade providing advanced neural network construction, training, and inference functionality. However, vast majority of these codes have been developed for a single compute node execution, which precludes them from training complex network models using large datasets in acceptable time. The challenge is to redesign existing or develop new frameworks that can take advantage of heterogeneous computing platforms to speed up the network training tasks while providing easy to use programming abstractions for domain scientists.

In this project, students will analyze open-source deep learning software frameworks and will work on optimizing them and removing bottlenecks in order to improve performance of the applications relying on these codes. Students will be expected to contribute their changes back to these codes, as well as making new codes open sourced. The project will contribute to the development of NSF-funded computer system for deep learning and will result in open-source software that will be deployed on this system. Students will learn about parallel programming systems such as MPI, OpenMP, and OpenCL, how to study their performance, and techniques for improving that performance.

Project 3: Computational Materials Science: Visualization and Machine Learning

Co-mentors

Andre Schleife

Andre Schleife
Materials Science and Engineering

Andrew Ferguson

Andrew Ferguson
Materials Science and Engineering

Social impact

Provide sophisticated yet intuitive and user-friendly visualization for effective materials data analysis and data dissemination to a broad scientific audience and the general public.

Project description

Modern computational materials science produces large amounts of static and time-dependent data that is rich in information. Examples include atomic geometries of complex biomolecules, condensed-matter crystals, and electron-density probability distributions. Extracting the relevant information from these data to determine the important processes and mechanisms constitutes an important scientific challenge. The availability of sophisticated yet intuitive visualization is a crucial component of effective data analysis, and is vital in disseminating results to a broad scientific audience and the general public.

In this project we use and develop physics-based ray-tracing and stereoscopic rendering techniques to visualize the structure of existing and novel materials e.g., for solar-energy harvesting, optoelectronic applications, and focused-ion beam technology. We will couple these visualization tools with Maxwell solvers and supervised machine learning algorithms to perform targeted discovery and rational design of new materials with tailored optical properties. The team will establish a powerful and intuitive platform for visualization of atomic geometries, optical reflection and transmission spectra, and time-dependent electronic excitations. This platform will allow for guided design of next-generation optical materials for use in novel lenses or energy-saving window coatings.

Working towards this goal, students will analyze and visualize atomic geometries and electron densities from first-principles simulations of excited electronic states using density functional theory (DFT) and time-dependent DFT. They will develop an open-source tool that interfaces an external Maxwell solver with the scikit-learn Python-based machine-learning library to perform supervised machine learning and guided materials design and discovery. Students will also develop codes based on the open-source ray-tracer Blender/LuxRender and the open-source yt framework to produce image files and movies from these data. Stereoscopic images will be produced that can be visualized using e.g. Google Cardboard or other virtual-reality viewers. Examples for possible outcomes can be seen here.

Project 4: Data Storage and Analysis Framework for Semiconductor Nanocrystals used in Bioimaging

Co-mentors

Andre Schleife

Andre Schleife
Materials Science and Engineering

Michal Ondrejcek

Michal Ondrejcek
NCSA

Social impact

By exploring systematic exchange of data and workflows, this project provides insights and best practices for the future of collaborative research. Providing access to sample specific data and analysis to the international community will accelerate deployment of novel semiconductor nanocrystals for bioimaging.

Project description

Light-emitting molecules are a central technology in biology and medicine that provide the ability to optically tag proteins and nucleic acids that mediate human disease. In particular, fluorescent dyes are a key part of molecular diagnostics and optical imaging reagents. We recently made major breakthroughs in engineering fluorescent semiconductor nanocrystals to increase the number of distinct molecules that can be accurately measured, far beyond what is possible with such organic dye molecules. We aim to develop nanocrystals that are able to distinguish diseased from healthy tissue and determine how the complex genetics underlying cancer respond to therapy, using measurement techniques and microscopes that are already widely accessible.

In order to achieve this goal, we need to understand a complex design space, that includes size, shape, composition, and internal structure of the different nanocrystals. Students in this team will work with computational and experimental researchers in several departments in order to establish a database to store, share, and catalog optical properties and other relevant data describing semiconductor nanocrystals. This requires developing schemas and analysis workflows that can be efficiently shared between multiple researchers. Eventually, both the data and the workflows will be made available to the general public.

Students will first identify all information that will need to be included in this catalogue. Students will then write JSON and python code and interface with Globus and the Materials Data Facility. They will create well-documented iPython notebooks that operate directly on the Globus file structure and run in the web browser. Students will also develop code that automatically analyzes data stored in the facility, e.g. to verify and validate experimental and computational results against each other. This project is highly interdisciplinary and students will work with a team of researchers in bioengineering, materials science, mechanical engineering, and NCSA.

Project 5: Eye Tracking Race and Cultural Difference in Video Games

Co-mentors

Ben Grosser

Ben Grosser
School of Art & Design

Jodi Byrd

Jodi Byrd
English

Social impact

Identifying the effect of video games' designs on race and cultural differences perceptions, critical thinking, and split-second decision-making.

Project description

How do players of contemporary video games perceive and process race-based information as part of their gaming experience? What role does that perception play in split-second decision-making processes in terms of character combat, map navigation, and attitudes towards cultural information throughout game environments? This project at the NCSA Critical Technology Studies Laboratory will combine eye tracking of visual attention during game play, data analysis of that attention, and visualization of the results towards a critical understanding of race and cultural difference in video games. We will use the Irrational Games' 2013 AAA title, BioShock Infinite, to examine the effects of both active (e.g., visible race of characters and combatants in the game) and passive (e.g., racial background materials embedded in environments, level designs, and other artistic content) processing of race-based visual data. The game Never Alone—a collaboratively designed game with input from Alaskan Native elders that includes activities and artifacts intended to foreground indigenous cultural difference—will be used to explore the effectiveness of such an approach for creating cultural awareness and learning among gamers. The results of this research will lead to new insights about how the designs of video games affect gamer attitudes towards race and difference, and will suggest new approaches for future game designers aiming to positively affect such attitudes in the future.

We will employ commodity eye tracking hardware and software to capture gamer attention during gameplay. The data produced by these systems requires significant post-processing to understand. Furthermore, such systems aren't designed to temporally synchronize that data with active gaming experiences. Therefore, our students will develop software to process the eye tracking data, synchronize it over time with a video capture of the gameplay, and visualize aspects of the data on top of that video capture. Such visualizations, especially when comparing data from multiple subjects, will illustrate the relationships between race-based content in video games and player perceptions and actions. The software developed will be hosted on the Critical Technology Laboratory's GitHub account, made available via the NCSA Open Source License, and offered to the critical gaming academic community for use and comment. The students will work with Grosser and Byrd on all aspects of the project from critical understandings of race and race-based imagery in video games to consideration of how design of gaming interfaces affects user attention. Outcomes will include data visualization and software development, and the application of this research will include the analysis software itself, visualizations of results, publications of findings, and art exhibitions derived from the research.

Project 6: Machine Learning to Estimate Crop Productivity from Multi-Sensor Fused Satellite Data

Co-mentors

Kaiyu Guan

Kaiyu Guan
Natural Resources & Environmental Sciences

Jian Peng

Jian Peng
Computer Science

Social impact

Developing a reliable and time-lead forecasting systems for crop type and crop yield for farm communities and government agencies use.

Project description

Reliable and time-lead forecasting systems for crop type and crop yield has critical values for various purposes for farmer communities and government agencies. Field-level estimation of crop yield is particularly useful for understanding how crop productivity responses to various management requirements and environmental factors. With continued climate variability (e.g., 2012 Midwest drought) and the ongoing climate change, farmer community and our government require better information to monitor crop growth and their near-term prospects. As the most important staple food production area, the U.S. Corn Belt produces half of the global corn and soybean combined, and has significant importance for regional, national, and global economy and food security. However, given the great needs for such forecasting information, we do not have a forecasting system for the U.S. Corn Belt for public use. This project will develop scalable machine-learning methods to integrate data from satellite remote sensing and other auxiliary information to make accurate yet cost-effective predictions of crop type. First, we will generate a 30-meter (2000-present; 10-meter for post-2014 period), daily, cloud-free data stack for the three major Corn Belt States, Illinois, Iowa, Indiana, by integrating three major satellite datasets (Landsat, MODIS, and new Sentinel-2). We will build upon this data stack to develop a machine-learning forecasting system for crop type, up to a lead time of 1-to-2 months before the harvest time with a particular focus on rain-fed corn and soybean. We will fully leverage satellite information, including spectral, phenological, and field-level texture information, for our analytics to achieve field-level predictions with advanced deep machine learning approaches.

In this project, the REU students will implement a computational pipeline that extract real-time satellite images, deposit data into a database and develop data fusion and machine learning components for predictive tasks. Due to the enormity of the satellite data and the highly demanding computational requirements, the students will need to distribute storage- and computation-intensive modules on the Blue Waters supercomputer. The students will have meetings with both Guan and Peng on a regular basis, participate the group meetings and reading groups organized in their research labs, and interact with graduate students. The students will learn advanced knowledge in remote sensing and machine learning and improve scientific research skills.

Project 7: Intelligent Synthesis: Statistical Learning to Optimize Graphene Synthesis Parameters for Nanomanufacturing

Co-mentors

Elif Ertekin

Elif Ertekin
Mechanical Science and Engineering

Placid Ferreira

Placid Ferreira
Mechanical Science and Engineering

Social impact

Provide sophisticated data driven approaches to enable high quality, reproducible synthesis of nanomaterials for use as manufactured components in nanoelectronics.

Project description

Material synthesis is a primary bottleneck in emerging nanoelectronic devices. The promise of nanoelectronics will not become a reality unless the synthesis process is scalable and leads to high quality, reproducible materials. Although there exists a tremendous amount of academic and industry research in synthesis, most advancements in synthesis science are achieved by expensive and tedious trial and error approach.

In this project, we will go beyond the traditional trial and error approach by adopting a data-driven methodology to rapidly optimize the chemical vapor deposition synthesis of graphene and other emerging 2D nanomaterials. The key aspects include building and populating a large database of synthesis parameters and results, and implementing a system for automated data capture and extraction from actual growth experiments in real time. The database will be populated by both experiments carried out at Illinois and via crowd-sourcing from research groups around the world.

Students will develop and populate the 2D materials synthesis database, and will implement a tool that interfaces the database with Python-based libraries for supervised machine learning to perform targeted growth parameter optimization. Students will also develop a configurable system for automated data collection from nanofabrication tools during growth experiments in real time. Experimental parameters will be pushed to a cloud server so that they can be curated and served to computational models. Educational video tutorials on the synthesis database and the machine learning approach will be developed. These videos will be placed on the nanoHUB and aimed at audiences comprised of high school and undergraduate students.

Project 8: Energy Workflows

Co-mentors

Daniel Katz

Daniel Katz
NCSA

Kathryn Huff

Kathryn Huff
Nuclear, Plasma, & Radiological Engineering

Social impact

Making progress towards safer reactor designs, leading to more plentiful energy for societal needs.

Project description

The students will automate an existing high performance computing simulation workflow and its corresponding data pipeline using the Parsl python parallel scripting library.This collaboration will develop reproducible workflows for conducting simulation and analysis of phenomena in advanced nuclear reactor designs. The simulation and analysis workflow to be automated involves demonstration of the UIUC-developed Moltres application within the Multiphysics Object-Oriented Simulation Environment (MOOSE) Finite Element Modeling (FEM) ecosystem.

This work will be conducted in a transparent and open manner, with an emphasis on maximizing reproducibility and reuse potential. Our work will additionally leverage literate programming tools (e.g., Jupyter notebooks) as a platform for communicating analysis methods and results. The undergraduates will work as part of a team ensuring the efficiency, transparency, and reproducibility of the simulations being conducted and the underlying software (which is under constant development). Their tasks will familiarize them with scientific software development best practices such as pair programming, unit testing, automated documentation, and reproducible workflows.

Specifically, the students will pair program alongside the faculty mentors and one another to implement a Parsl-based data pipeline for molten salt nuclear reactor multiphysics simulations. As their familiarity with the project grows, they will contribute enhancements to the Parsl codebase and help researchers in nuclear engineering deploy this workflow to conduct repeatable validation and verification demonstrations of the simulation capabilities. Meanwhile, they will be guided to enriching both Parsl and Moltres documentation for the methods as needed. This work will make progress towards safer reactor designs, leading to more plentiful energy for societal needs.

Project 9: Modeling and Detection of Black Hole Collisions with the Blue Waters Supercomputer

Co-mentors

Gabrielle Allen

Gabrielle Allen
Astronomy/Education

Roland Haas

Roland Haas
NCSA

Eliu Huerta Escudero

Eliu Huerta Escudero
NCSA

Social impact

By developing open source software to further scientific community efforts to detect gravitational waves, students will learn skills that they can use to tackle grand computational challenges across science domains, including those with broad social benefits.

Project description

The Laser Interferometer Gravitational-Wave Observatory's (LIGO) detection of gravitational waves from merging black holes in September 2014 inaugurated a new era in astronomy and astrophysics, opening a window to observe the Universe through gravitational radiation. Occurring 100 years after Einstein's announcement of his theory of general relativity, the detection spurred world-wide interest in physics and science in general, making headline news around the world. The recent Nobel Prize awarded for this detection and the announcement of the detection of the double binary neutron star system by LIGO/Virgo underline the importance of these efforts and the interest that the wider society has in it.

In this project, a pair of REU-INCLUSION students will be involved in research within the larger LIGO community and will work on two key components of this project. One component will be porting an existing C library into LIGO's Algorithm Library. The code describes the gravitational waves emitted by the merger of black holes. The goal is to use this software in upcoming LIGO searches of gravitational wave sources. The second component requires development of a numerical routine to post-process numerical relativity simulations that describe the merger of black holes and neutron stars. The students will write python/c libraries to extract information from these simulations through an optimization procedure. Through this work the students will become familiar with one of the more exciting research topics in contemporary astronomy, and this work will provide them with new tools to study phenomena across science domains that require high performance environments. Having open source software to work with LIGO data makes it possible for interested members of the public to contribute to the science and also for LIGO science to be incorporated in, for example, high-school syllabi to train future scientists.

 


View projects from 2017