Source: https://drive.google.com/file/d/1w5XnqaJP_DZWwVsVnRyzTQ7BsZn-3M7H/view?usp=sharing

Reproducibility and Bias

  • Define what constitutes reproducible & replicable data science
  • Explain the challenges and limitations to reproducibility & replicability
  • Understand confirmation bias & Identify cases of confirmation bias

Reproducibility

re-performing the same analysis (with the same code) using a different analyst

Reasons for a study to fail to be reproducible

  • data not provided, unable to be shared
  • lack of peer review, or not in a transparent process
  • missing code, failure to publish code, trade secrets
  • different data provided
  • lack of computational literacy
  • different software versions, deprecated software, or legacy systems
  • lack of statistical literacy or maliciousness (p-hacking)
  • poorly written or incomplete documentation
  • no cash or funding

The Replicability Crisis

Replicable

re-performing the experiment and collecting new data

Replicability is harder than reproducibility

because, by definnition, the underlying data are different. And data are variable.

Replicability

“The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials.”

Reasons for a study failing to replicate

  • Finding was not “real” (or small effect size)
  • Measurement error
  • Variable finding (things change over time)
  • Samples come from different populations
  • Different experimental design or conditions
  • Fraud

The Unicorn Test

If you find a “unicorn” result, return to your data several times and from a different view points until you are fully convinced.

Cognitive Bias

A cognitive bias wherein humans have a tendency to search for, interpret, favor, and recall information in a way that confirms one’s preexisting beliefs or hypotheses.