Wasted Research

September 7, 2010

Last week, a dear friend, who I hope I get to work with next year, sent me this excerpt:

Sharing health data: good intentions are not enough

Elizabeth Pisani a & Carla AbouZahr b

a. London School of Hygiene and Tropical Medicine, Keppel Street, WC1E 7HT, London, England.
b. Department of Health Statistics and Informatics, World Health Organization, Geneva, Switzerland.

Research data are desperately underused too, in part because of a critical shortage of competent data managers.5 In other fields – genetics, banking and retailing – data management is a valuable skill. People are trained and develop careers in the field. In public health research, data management is the poor cousin of analysis. Undervalued and underfunded, inadequate data management undermines the rest of the scientific enterprise. One review in the United Kingdom of Great Britain and Northern Ireland found that many of the variables collected in epidemiological studies were never cleaned and coded, so they could not be used even by the primary researchers, let alone shared.6 In complex population-based surveys in developing countries, data management and analysis skills are in even shorter supply, so a higher proportion of data probably goes to waste.7

I added the points of emphasis, but I think this is really telling, especially when you think about HIV/AIDS research in “developing countries”.

Wasted data = wasted time = wasted effort = wasted opportunities for more effective policy.

Edit: Here is a link to the full article quoted above.

More awesome excerpts:

“When we’re dealing with public health research, wasted data can translate into shorter, less healthy lives. Improving data management so that data can be shared is a first step to reducing that waste. But it will not be enough. We need to change the incentives that pit the interests of individual researchers against the interests of public health, that pit institutional interests against the more rapid advancement of knowledge and understanding. Governments may hold micro-data back from international organizations, but there’s no excuse for international organizations to limit access to the aggregate data that governments do provide.

It’s easier to understand why individual researchers are reluctant to share data they have collected. That reluctance will certainly remain entrenched as long as their employers – research councils, foundations and universities – regard publication of research papers in peer-reviewed biomedical journals as the main yardstick of success.8 If, however, “publish [papers] or perish” were to be replaced by “publish [data] or perish”, the picture might change rapidly, as it did in genomics.”

Researchers sometimes argue that interpretation of their data is so dependent on understanding local conditions that the data would be worthless to other scientists. This is often a reflection of inadequate documentation, but also a necessary failure of imagination. Sailors keeping log books on whaling boats in the 1600s could not have predicted that, centuries later, the data would be an important source of information for climate change scientists.25 Most funders have stringent peer-review procedures; few invest in research that they believe is of only very localized importance, and few wish to support research that produces data of such poor quality that it has no further value. Publicly-funded data can also be invaluable to students learning data management and analysis skills. It thus seems fair to expect that almost all public health research funded by taxpayers or charities might be useful to secondary analysts. If a piece of research is considered worthy of publication in a peer-reviewed journal, the underlying data should also be worth publishing.”

“Goals for funders and researchers

Here we propose several goals to which funders and researchers can jointly aspire and towards which progress can be measured: (i) all data of potential public health importance funded by taxpayers or foundations will be appropriately documented and archived in formats accessible to the wider scientific community; (ii) all data provided by governments to databases developed by publicly-funded organizations will be freely available to any user, at the level of detail at which it was provided; (iii) the publication of a research article in a biomedical journal will be accompanied by the publication of the data set upon which the analysis is based; (iv) funders and employers of researchers will consider publication of well managed data sets as an important indicator of success in research, and will reward researchers professionally for sharing data; and (v) all planned research will budget and be funded to manage data professionally to a quality adequate for archiving and sharing.”

Obviously, I don’t trust you to follow the link and actually read the paper. 😉

