Computation-Intensive Research And The Quest For Scientific Reproducibility

26 Nov 2021

      The fruits of scientific research are supposed to be open to all. A key
part of this is the need for reproducibility -- the idea that somebody
else can repeat the same experiments and analysis and (hopefully) come
to the same conclusions. It has long been a common expectation that
researchers will make their raw data available to others for this
purpose, but nowadays even that is likely no longer enough. The analysis
of the data usually requires some particular piece of computer software,
even if this was just some in-house scripting done on top of a
commonly-available toolkit or package.

Two different reports on this subject have come out recently, this one
<https://www.theregister.com/2021/11/25/research_software_inquiry/>
from the UK and this
<https://arstechnica.com/science/2021/11/keeping-science-reproducible-in-a-world-of-custom-code-and-data/>
with examples from the US and elsewhere. The latter goes into a lot more
detail, including good news (the rise of publicly-available data sets
which get heavily used for many different analyses), and bad:

    From 2017 through 2019, Tsuyoshi Miyakawa, the editor-in-chief of
    the journal Molecular Brain, replied to 41 article submissions by
    requesting that the authors provide their complete source data for
    review, as per the stated policy of the journal. Only one author
    did so.

    ...

    Based on his efforts to replicate papers from other statisticians,
    Thomas Lumley, a professor of biostatistics at the University of
    Auckland in New Zealand, says of the phrase data available upon
    request: "When people put it in their papers, what they typically
    mean is 'data not available.'"

As for making code available, that has its own challenges: often the
scripts/programs are hastily thrown together, and the creators may be
embarrassed to have others see it in this state. Or it’s not
likely to work properly anyway, outside of the original systems where
it was developed.

The good news is, bodies that fund the research and the journals that
publish the results are becoming more aware of such issues, and
increasingly trying to ensure that procedures for dealing with them are
built into the projects from the beginning.

Lawrence D'Oliveiro

Michael Cree

Matthew Skiffington

tags

participants (3)