Project 7: Understanding Types of Research Problems from a Replication Perspective

Co-Mentors: Daniel Katz (NCSA) and Victoria Stodden (iSchool)
Social Impact: Identify the software capabilities needed to enhance reproducibility of computational findings to restore public confidence in research community and scientific discoveries.

Project description: Replication is a growing concern in modern research, as an increasing number of papers and results each year are shown to be irreproducible or are withdrawn. When these are publicized in the research community, the community can respond to fix the specific problems (reasonably easily) or the systematic problems (much harder, but being attempted). However, when the publicity stretches into the general press, this lead the public towards skepticism of all science and all research, which is harder to recover from. If we can move towards science being more generally and more automatically reproducible, we can avoid this loss of public confidence. We can also work towards democratization of science, where more people being able to start with existing findings and ask new questions means more and better results. And we can increase general human well-being by recognizing that a small amount of current research, including in health and all other fields, results in incorrect conclusions, due to bugs, errors, etc., and by checking all research, we could find these errors more easily, giving us more faith in the research that has been reproduced.

In this project, we will investigate the role of software in the scientific discovery process by exposing gaps in the verifiability of published computational results, based on Stodden’s work on reproducibility. We plan to understand the software capabilities needed to enhance the reproducibility of computational findings, including workflow and automated process capture, the role of software citation in addressing incentives, and best practicing in documentation and development. Students will attempt to reproduce published computational studies to understand these gaps and solutions that are needed to facilitate reproducibility in computational science, understand why this is or is not possible, and try to design solutions. The faculty and students will use these experiences to classify different types of workflows with respect to their reproducibility, and the students will create a website to demonstrate this to the public. This project will also expose the students to a wide variety of computational tools and develop a broad base of skills in computational science research.