A new report published in Nature has shed some light on the nature of the so-called “replicability crisis” in psychology and the social sciences. The study cataloged researchers’ attempts to reproduce the findings of 21 selected experiments situated in the social sciences published between 2010-2015 in Science and Nature, two well-known and prestigious academic publications.
In the study, the authors report that out of the original 21 studies, the findings from only 13 (~62%) of them were successfully reproduced, with an average effect size of about 50% less than the effect size reported in the original studies. While certainly not a complete failure of replication, the study did highlight potential problems surrounding the process of replication and the publishing habits of academic journals; notably the presence of false positives and the over-exaggeration and overestimation of true effect sizes.
The study is notable in that the studies were replicated with much larger sample sizes, allowing for a more robust observation of results. Additionally, the study found that peers’ beliefs about replicability strongly positively correlated to actual replicability, indicating that researchers have a generally accurate feel about which studies will be replicated and which ones will not be. Such a finding is a silver lining as it indicates that most researchers are aware of the limitations of their studies, and know when to take certain results with a grain of salt. The authors are clear to express their optimism by saying that their findings give potential strategies for combating systematic bias in the publishing process.
A Crisis In Reproduction
This is not the first time social psychology has been in the hot seat regarding the reproduction of findings. The social sciences have somewhat of a history for being criticized by other scientists as utilizing questionable research practices or relying on generally untestable theoretical frameworks; an attitude that has coalesced into a distinction between the “soft” sciences like psychology, sociology, and economics, and the “hard” sciences like physics, chemistry, and (sometimes) biology.
A widely cited study in 2015 reported that out of 100 influential papers published in 3 high ranking psychology journals during 2008, only one third (~36%) of the original findings were successfully replicated. The same study also found that out of the studies that were successfully replicated, the mean effect size was diminished by about half, indicating a substantial decline. Out of all the studies, studies in cognitive psychology showed the most likelihood of reproduction (50%) and studies in social psychology showed the least (25%). Overall, only two-thirds of the combined original and replicated studies (~68%) showed any statistically significant effects at all, a modest amount better than what one would expect from simply random chance.
Several reasons have been given for the difficulties of replication in psychological journals. First, psychology by its very nature investigates monumentally complex systems (i.e. humans) without a way to achieve the extremely precise controls and experimental manipulations seen in physics in chemistry, normally on pain of severe ethical violations.
This inherent limitation on the discipline certainly limits the strength of conclusions that can be drawn from a single study. Second, (and this is a problem in almost all scientific disciplines) replication studies are implicitly discouraged from being pursued, as the publication arena favors novel results. For many academics, there is a high-pressure “publish or perish” environment where they are encouraged to push new research instead of validating past findings. This pressure of academic climates and the tendency for journals to not publish experiments that have negative results leads to a “file drawer” effect, where studies with positive results are selected for and negative studies never see the light of day.
Third, constructs used in the more “social” sciences have a level ambiguity and require a level of interpretation that experiments in the “hard” sciences do not have. Psychological characteristics are abstract constructs and have to be investigated indirectly, often relying on qualitative methods over quantitative ones. For example, it is much more straightforward to measure something like mass or electric charge than to accurately measure something like a person’s dating aspirations or public perceptions of law enforcement.
Studies In Nature And Science: Replication Or Not?
This time, the researchers picked 21 studies published between the years of 2010-2015 in both Science and Nature; publications with impact factors of 37 and 40 respectively. The pool of studies draw mainly from the field of social psychology and includes studies focused on things like: the effects of image priming on analytical thinking processes, the effect of handwashing on risk-taking behavior, and the effects of levels of communication on common-resource pooling.
Notably, each replication consisted of a sample size about 5 times larger than the original sample size and all the replication experiments (except for one) were constructed under the supervision of and verified by the original researcher before data-gathering, ensuring the studies were accurately replicated to an acceptable degree.
12 replications found a significant effect in the direction of the original studies. Of those 12 studies, the average standard effect size was .249, as compared to .46o in the original studies. In other words, the observed effect was approximately 50% weaker in the cases of successful replication. The researchers then combined all of the replications and original studies into one meta-analysis, which indicated that 76.2% of the total experiments showed significant effects in the direction of the original. The researchers are clear to point out that this number is likely exaggerated, due to the already present publishing bias that favors positive results.
Very importantly, the study also gauged the researchers’ beliefs about the likelihood that their studies would be successfully replicated. What they found was that the average prediction rate for replication was judged to be about 60.6%, extremely close to the actual successful replication rate of 61.9%. The authors of the study take this very strong correlation as evidence that the scientists “identified a priori systematic differences between the studies that replicated and those that did not.” In other words, social scientists, in general, have a pretty good intuitive grasp of which studies are like;y to replicate and which ones are not. The authors of the study take this finding as a sign to be optimistic about emerging methods to combat reproducibility challenges.
So what do we make of the data as a whole? The new study seems to imply 2 major things. (1) many studies report false positives and (2) many studies have a tendency to overstate the effect sizes in the case of true positives, most likely due to small sample sizes. The presence of (1) can be attributed to publishing practices in which novel positive results are encouraged over negative or replication results. In order to minimize (2), the authors suggest implementing pre-registration plans of experimental setups and analysis, in order to combat the over-reporting of effect sizes.