Intended for healthcare professionals

Letters

Code of conduct is needed for publishing raw data

BMJ 2001; 323 doi: https://doi.org/10.1136/bmj.323.7305.166 (Published 21 July 2001) Cite this as: BMJ 2001;323:166
  1. Gunther Eysenbach (ey{at}yi.com), editor, Journal of Medical Internet Research,
  2. Eun-Ryoung Sa, fellow
  1. Research Unit for Cybermedicine and eHealth, Department of Clinical Social Medicine, University of Heidelberg, D-69115 Heidelberg, Germany
  2. Global Health Net-Supercourse Group, University of Pittsburgh, Department of Epidemiology, School of Public Health, Pittsburgh, PA 15213, USA

    EDITOR—Hutchon in his article showed the benefits of publishing raw data on line.1 The method of opening up raw data for research has strong parallels to the “open source” movement of the software industry, where developers freely distribute the source code and allow usage and modification.2 The open source community has learnt that this rapid evolutionary process produces better software than the traditional closed model, in which only a very few programmers can see source, and everybody else must blindly use an opaque block of bits (http://www.opensource.org/).

    Publishing raw data may in a similar way enhance the speed and quality of research, as other researchers can reanalyse the data to verify results or to draw new conclusions. Preprint servers, as well as innovative e-journals, offer possibilities to share data and encourage other scholars to participate in the research process.2 The Journal of Medical Internet Research (http://www.jmir.org/) has, from the beginning of its existence, explicitly invited authors to attach original data that could be downloaded and dynamically analysed, for example, with JAVA applets.3 Until today, however, no author has submitted a paper with raw data. Are authors perhaps afraid that other researchers analyse their data too thoroughly, “cream off,” and publish interesting results, and thus preclude the publication of further papers? In open source genomics research, debates over priority, authorship, and credit for analysing data in depth have already arisen. 4 5 If researcher A laid open the complete dataset, and researcher B discovers a new relation or other “publishable” results in the dataset, what rights of first publication does researcher A have? Researcher B could probably publish new discoveries with a simple reference to the open source—which may be unsatisfactory for researcher A, especially if he or she planned to do further analyses with the dataset.

    We may need a more clear code of practice on this issue. In the open source software industry, everybody who amends open source code to produce more advanced software agrees that the new software must be open source again, a practice that could be analogously applied in biomedical publishing. Also, one may encourage a practice where authors who made available the original raw data (and also subsequent authors who generated more results with these data) should be invited to act as co-authors in any subsequent publications. This prospect may enhance the willingness of researchers to open their raw data in the first place.

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.