P-hacking in Empirical Accounting Research
In the world of academic research, the integrity of statistical analysis is paramount. Yet, recent scrutiny reveals a troubling trend: a notable proportion of test statistics in accounting studies hover just below conventional significance thresholds, such as p-values of 0.05 and 0.01. This intriguing phenomenon raises important questions about the prevalence of questionable research practices (QRPs) among researchers striving to report their findings as statistically significant. Professor Xin Chang (Nanyang Technological University) and his co-authors, Professors Huasheng Gao (Fudan University) and Wei Li (City University of Hong Kong), analysed articles from six leading accounting journals to uncover whether these subtle statistical manipulations are influencing the credibility of published research.
Discontinuity in Test Statistics
One striking finding from the study is the unusual abundance of just-significant test statistics (which fall right around the threshold for statistical significance), particularly in research with smaller sample sizes. This pattern is more evident in experimental studies than in archival ones.
Covering data from 1990 to 2020, the research scrutinises both experimental and archival studies. Experimental studies frequently report p-values, while archival research typically uses t-statistics or z-statistics. The researchers have converted these varied statistics into a common metric to facilitate a cohesive analysis.
Significance Thresholds and Researcher Behaviour
Notably, there is a noticeable "bunching" of test statistics just below the conventional significance thresholds. This clustering suggests that researchers may exercise discretion in data analysis and reporting, potentially engaging in practices such as p-hacking to achieve statistically significant results. P-hacking involves manipulating statistical analyses or data to obtain a desired p-value. While this pattern raises concerns about potential QRPs, such observations do not definitively prove that these practices are being employed. Instead, they highlight a need for greater scrutiny and transparency in research methodologies to ensure that results are reported with the highest level of integrity.
Experimental vs Archival Studies
When looking at experimental studies, there is a higher frequency of just-significant results. This pattern suggests that researchers may exercise more discretion in data handling and reporting, especially since smaller sample sizes in these studies can lead to more variability, making such discretionary practices more impactful.
Impact of Sample Size and Researcher Freedom
The study also highlights the influence of sample size on the frequency of just-significant results. Smaller sample sizes often lead to more noticeable discontinuities in test statistics, implying that researchers might be more inclined to tweak their analyses to achieve significant findings when working with limited data.
Additionally, the research explores how various aspects of experimental design, such as the number of experiment constructs and the choice of statistical tests, affect the likelihood of producing results that are barely significant. Moreover, having more degrees of freedom in the research design is associated with greater discontinuity in test statistics, further suggesting that flexibility in research methods can influence the appearance of statistically significant outcomes.
Advancing Transparency and Integrity
The findings raise concerns about the credibility of some reported results in accounting research due to the potential for QRPs. While the study does not directly observe QRPs, it suggests that the prevalence of just-significant results warrants scepticism. The pressure to publish significant findings has led to a troubling trend in the academic world: the manipulation of data to achieve lower p-values and meet journal expectations. This publication bias not only distorts the scientific record but also undermines the integrity of research.
Importantly, the authors emphasise that their findings do not serve as evidence of any purposive, fraudulent behaviour committed by accounting researchers, neither do their results imply that accounting research is collectively unreliable. Instead, they agree with award-winning science journalist Christie Aschwanden’s assertion that “we all p-hack, to some extent.” Even a simple scientific question may require researchers to make many choices that can influence their results. Without safeguards in place, it is almost inevitable for researchers to succumb to natural human biases, which tilt the balance in their favour and yield false positives. Therefore, their research on p-hacking in the field of accounting, as well as prior studies on p-hacking in other scientific disciplines, reflect what Christie Aschwanden says: “science is hard—and sometimes our human foibles make it even harder,” but “science deserves respect exactly because it is difficult”.
In conclusion, the study calls for increased scrutiny of test statistics in accounting research. It urges the research community to be aware of the potential for questionable research practices (QRPs) and to strive for greater transparency and integrity in their reporting. Researchers are encouraged to exercise caution when interpreting just-significant results, as these findings might not always reflect genuine scientific discoveries. As we move forward, it is crucial for both researchers and journals to prioritise transparency and integrity over significant results, ensuring that the true value of scientific work is recognised and upheld. Given the increasing attention to research integrity in scientific research, the study contributes to open, evidence-based discussions of the issue.
Note: This research paper was published by the Journal of Accounting Research (Wiley) in September 2024.
Xin Chang, Simba is a Professor of Finance at Nanyang Business School and Associate Dean (Research) overseeing PhD programs and research activities at Nanyang Business School. He specialises in corporate Finance, especially capital structure, mergers and acquisitions, and stock valuation. He taught various courses to undergraduate, honours, master, and PhD students at HKUST, the University of Melbourne, the University of Cambridge, and NTU.