Open Data: Can It Prevent Research Fraud, Promote Reproducibility, and Enable Big Data Analytics In Clinical Research?

Download (0)

Full text




Open Data: Can It Prevent Research Fraud, Promote Reproducibility, and Enable Big Data Analytics In Clinical Research?

MYERS, Patrick Olivier

MYERS, Patrick Olivier. Open Data: Can It Prevent Research Fraud, Promote Reproducibility, and Enable Big Data Analytics In Clinical Research? Annals of Thoracic Surgery , 2015, vol.

100, no. 5, p. 1539-1540

DOI : 10.1016/j.athoracsur.2015.08.041 PMID : 26522522

Available at:

Disclaimer: layout of this document may differ from the published version.

1 / 1


To be successful, such a program would need to be concise and pragmatic. The course must have safe- guards to prevent sharing of answers and mechanisms to ensure that the researcher has indeed completed the module. Also, the information conveyed should be implicitly usable to the researcher and ideally appro- priate for a wide variety of journals, researchers, and data. The successful completion of this module would be made available in a format that could be verified and transferred to all major medical journals and conferences.

Some may argue that implementation of such a training module will impose additional time cons- traints on busy investigators; however, we stress the importance of maintaining the course as concise, practical, and applicable to wide array of journals.

Maintaining scientific integrity in our journals is of critical importance, and ultimately, such efforts will benefit our medical community and the populations we serve through the dissemination of reliable and actionable new information. Although a multifactorial approach to scientific misconduct is necessary, estab- lishing educational awareness is a sound first step in the process.

Alexander Iribarne, MD, MS Jock N. McCullough, MD Section of Cardiac Surgery Department of Surgery

Dartmouth-Hitchcock Medical Center 1 Medical Center Dr

Lebanon, NH 03766


1. Bando K, Schaff HV, Sato T, Hashimoto K, Cameron DE.

A multidisciplinary approach to ensure scientic integrity in clinical research. Ann Thorac Surg 2015;100:1534–40.

2. Blanco MA, Capello CF, Dorsch JL, Perry G, Zanetti ML.

A survey study of evidence-based medicine training in US and Canadian medical schools. J Med Libr Assoc 2014;102:1608.

3. Oude Rengerink K, Thangaratinam S, Barnfield G, et al. How can we teach EBM in clinical practice? An analysis of barriers to implementation of on-the-job EBM teaching and learning.

Med Teach 2011;33:e125–30.

4. Stenbeck NH. ORI: introduction to the responsible conduct of research. Washington, DC: Department of Health and Human Services. Available at

files/rcrintro.pdf. Accessed August 7, 2015.


The reliability and reproducibility of biomedical research is under increasing scrutiny as the number and scope of high-profile manuscript retractions for research fraud has increased [1]. Bando and colleagues [2] provide an important contribution with strategies to prevent scientific misconduct. Their suggestion to mandate preservation of raw research data and making anonymized patient data available on reasonable request merits further discussion.

The Office of Science and Technology Policy asserted in 2013 that federally funded research data should be made publicly available for access, search, and analysis. The Public Library of Scienceswas thefirst mainstream journal to introduce an open-data mandate for all submissions in 2014. The Gates Foundation announced in 2014 that it would demand open data of the researchers it funds[3].

Finally, the Institutes of Medicine issued an extensive report on sharing clinical trial data earlier this year [4].

These different initiatives have stimulated discussion on open data policies, although they remain quite marginal to this date[3].

The reasoning for open-data mandates is that providing the raw data underlying a clinical trial should allow reproduction and validation of the analysis, to detect errors and deter fraud, in an era plagued by irreproducible results [5]. It should also allow other investigators to answer secondary research questions or aggregation into large-scale meta-analyses. Data sharing should maximize the benefits from the vast amount of research data collected and the contribution of each study subject, while respecting the privacy of the study subjects. This

should also provide a fair opportunity for researchers to publish results before secondary investigators gain access to the data and protect the commercial interests of sponsors in gaining regulatory approval[4].

Beyond prevention of research fraud, open data man- dates offer an exciting opportunity to confront the“small sample size”issue in clinical research. Other industries, such asfinance and energy, have embraced data analytics [6], and researchers at Google have shown that an order of magnitude growth in the size of data sets leads to significant improvements in performance of analyses and can overshadow improvements in modeling techniques [7]. Big data have only been reported in a select few epidemiologic studies, such as those linking myocardial infarction and rosiglitazone[8]or rofecoxib[9].

Two significant issues need to be resolved to enable open data mandates: guaranteeing the privacy of study subjects [10] and creating a safe, fair, and open infrastructure for data sharing [4]. The National Institutes of Health lists 65 open data repositories (

NIHbmic/nih_data_sharing_repositories.html) that it supports. Unfortunately, though, the lack of an adequate data-sharing platform is often listed by authors in justifying noncompliance with open data mandates[3].

In conclusion, data sharing should become normal, to allow verification of data and statistical analyses or allow big data mining across multiple study populations. The Society of Thoracic Surgeons should take a leading role in implementing open data in the field of cardiothoracic surgery. We owe it to our specialty and to our study

Ó2015 by The Society of Thoracic Surgeons 0003-4975/$36.00

Published by Elsevier





subjects to maximize the effect that their contribution to our research provides.

Patrick O. Myers, MD

Division of Cardiovascular Surgery

Geneva University Hospitals and Faculty of Medicine 4 rue Gabrielle-Perret-Gentil

1211 Geneva 14, Switzerland References

1. Wise J. Extent of Dutch psychologist’s research fraud was

unprecedented.BMJ 2011;343:d7201.

2. Bando K, Schaff HV, Sato T, Hashimoto K, Cameron DE.

A multidisciplinary approach to ensure scientic integrity in clinical research. Ann Thorac Surg 2015;100:1534–40.

3. Van Noorden R. Confusion over open-data rules. Nature 2014;515:478.

4. Institute of Medicine. Sharing clinical trial data: maximizing benefits, minimizing risk. Washington, DC: The National Academies Press; 2015.

5. Begley CG, Ellis LM. Drug development: raise standards for preclinical cancer research. Nature 2012;483:5313.

6. Badawi O, Brennan T, Celi LA, et al. Making big data useful for health care: a summary of the inaugural MIT critical data conference. JMIR Med Inform 2014;2:e22.

7. Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intell Syst 2009;24:8–12.

8. Brownstein JS, Sordo M, Kohane IS, Mandl KD. The tell-tale heart: population-based surveillance reveals an association of rofecoxib and celecoxib with myocardial infarction. PloS One 2007;2:e840.

9. Brownstein JS, Murphy SN, Goldfine AB, et al. Rapid identi- cation of myocardial infarction risk associated with diabetes medications using electronic medical records. Diabetes Care 2010;33:52631.

10. Sarwate AD, Plis SM, Turner JA, Arbabshirani MR, Calhoun VD. Sharing privacy-sensitive access to neuro- imaging and genetics data: a review and preliminary vali- dation. Front Neuroinform 2014;8:35.






Related subjects :