Source: https://drive.google.com/file/d/1w5XnqaJP_DZWwVsVnRyzTQ7BsZn-3M7H/view?usp=sharing
Reproducibility and Bias
- Define what constitutes reproducible & replicable data science
- Explain the challenges and limitations to reproducibility & replicability
- Understand confirmation bias & Identify cases of confirmation bias
Reproducibility
re-performing the same analysis (with the same code) using a different analyst
Reasons for a study to fail to be reproducible
- data not provided, unable to be shared
- lack of peer review, or not in a transparent process
- missing code, failure to publish code, trade secrets
- different data provided
- lack of computational literacy
- different software versions, deprecated software, or legacy systems
- lack of statistical literacy or maliciousness (p-hacking)
- poorly written or incomplete documentation
- no cash or funding
The Replicability Crisis
Replicable
re-performing the experiment and collecting new data
Replicability is harder than reproducibility
because, by definnition, the underlying data are different. And data are variable.
Replicability
“The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials.”
Reasons for a study failing to replicate
- Finding was not “real” (or small effect size)
- Measurement error
- Variable finding (things change over time)
- Samples come from different populations
- Different experimental design or conditions
- Fraud
The Unicorn Test
If you find a “unicorn” result, return to your data several times and from a different view points until you are fully convinced.
Cognitive Bias
A cognitive bias wherein humans have a tendency to search for, interpret, favor, and recall information in a way that confirms one’s preexisting beliefs or hypotheses.