Commentor Scott raises the issue of the incentive to collect data, particularly large data sets, if it is imperative to make said data public. This is a very important issue that I addressed a bit in my first post on the CRU fiasco, but it’s worth a couple of additional thoughts.
Here are some ways to provide incentives:
- Subsidies, in the form of grants or prizes. This is already done to some degree. Receipt of such funding to support data collection–i.e., the subsidization of a public good–should be contingent on making this data freely available.
- The provision of complementary services for a fee, such as consulting on the use of the data, or creation of customized data sets. This is the source of revenues for many providers of open source software.
- Licensing or sale of the data for a reasonable fee. This has been quite effective for many databases widely used by finance and economics academics (and industry), notably the CRSP stock and bond data (but others as well).
- Conditioning access to data on formal recognition in all working papers and published papers, perhaps including the creation of a “data authorship” category of something of the sort.
In brief, a variety of mechanisms can provide incentives to create information/data. A variety of mechanisms are utilized in markets for other information goods. The just listed are some of the most well known.
Intellectual property rights (e.g., granting rights of exclusive use, trade secrets) are other means by which creators can capture a stream of benefits, thereby giving them an incentive to produce information goods. These may be the efficient arrangement in some settings, but I am dubious that that is the case in the sciences and social sciences. Science depends on replication, which is incompatible with “I’d show you but I’d have to kill you” secrecy.
I also had an additional thought regarding how to evaluate scholarship for hiring, promotion, tenure, and salary decisions in a research university in an open source, non-journal dominated system. I mentioned the idea of using citations as a main metric, and evaluating the “quality” of citations based on characteristics of the works in which a citation occurs–such as the number of times the citing author’s work is itself cited. This is essentially a links-based metric. Major search engines, notably Google, utilize such metrics in ranking web sites to determine display order. The concepts underlying search engine ranking algorithms could perhaps be adapted to rank scholarly impact as well.
The creation of a data set that is utilized by other scholars could be an input to the ranking algorithm, providing another incentive to invest in their creation.
Presumably the CRU employees associated with the creation of a climate data set would have achieved a stratospheric impact factor/ranking under such a system because myriad other scholars wold have used the data. But of course, this would have come at a (private) cost: they couldn’t have controlled the conclusions of the work done with their data (which the emails suggest was an important consideration to them). But that private cost is swamped by the benefit of permitting open access to other researchers, so the proper response is: tough luck to you.