While practice varies from discipline to discipline, there is an increasing trend towards the planned release of research data. The need for data licensing arises directly from such releases, so the first question to ask is why research data should be released at all.
A significant number of research funders now require that data produced in the course of the research they fund should be made available for other researchers to discover, examine and build upon. The rationale given is that by opening up the data allows for new knowledge to be discovered through comparative studies, data mining and so on; it also allows greater scrutiny of how research conclusions have been reached, potentially driving up research quality. Many require that authors deposit their supporting data either with the journal itself or with a recognised data repository.
There are many additional reasons why releasing data can be in a researcher’s interests. The discipline of working up data for eventual release helps in ensuring that a full and clear record is preserved of how the conclusions were reached from the data, protecting the researcher from potential challenges. A culture of openness deters fraud, encourages learning from mistakes as well as from successes, and breaks down barriers to interdisciplinary and ‘citizen science’ research. The availability of the data, alongside associated tools and protocols, increases the efficiency of research by reducing both data collection costs and the possibility of duplication. It also has the potential to increase the impact of the research, not only academically, but also economically and socially.
Within the EU, the act of compiling a database attracts copyright insofar as the compiler has exercised intellectual judgement in selecting or arranging the data. In Finland, this copyright to database is created to the author/authors, that is the individual researchers, if they are doing independent research. If they are not doing independent research the copyright to databases is created by law to university. Aalto University has an appendix to work agreement. With this appendix the copyright to database and the copyright and other intellectual property rights that are result of research project receiving outside funding is transferred to the university.
EU also has a separate sui generis database right that applies to the contents of a database where a substantial investment was made to obtain, verify or present them. This sui generis database is always created by law the ownership of the database to the employer. Databases can also be protected as catalogues in the Nordic countries, this catalogue right is also owned by employer.
A source of confusion are the variations between jurisdictions in what can be done with copyright material. While the Berne Convention provides a level of consistency among its signatories there are still variations in the exemptions that each jurisdiction provides, and subtle differences concerning, for example, which acts count as copying, and what constitutes an insubstantial use or extract of a work. The latter is an important point because the exemptions to copyright and database rights permit a dataset to be compiled from insubstantial extracts from a number of other datasets, but the fact of whether the extracts are indeed insubstantial might be contested.
With all these complexities and ambiguities surrounding the rights of database compilers in the national laws, re-users need a license and Rules of the Road in able to achieve clear guidance from compilers on what they are allowed to do with the research data.
The ways of communicating permissions to potential re-users of data are licenses and waivers. A license is a legal instrument for a rights holder to permit a second party to do things that would otherwise infringe on the rights held. Only the rights holder can grant a license; it is therefore imperative that the ownership of intellectual property rights (IPR) to the data are established before any licensing takes place. A waiver is a legal instrument for giving up one’s rights to a resource.
Licenses grant permissions on condition that certain terms are met. Three conditions commonly found in licenses are attribution, copyleft, and non-commerciality.
While these all have their uses, they can cause problems in the context of datasets:
Datasets are particularly prone to attribution stacking, where a derivative work must acknowledge all contributors to each work from which it is derived, no matter how distantly. If a dataset is at the end of a long chain of derivations, or if large teams of contributors were involved, the list of credits might well be considered too unwieldy. The problem is magnified if different sets of contributors have to be credited in a different way, especially if automated methods are used to assemble the dataset – some of the benefits of automation are lost if attribution conditions have to be inspected manually. Licensors can tackle this problem by using the waiver CC0 explained below that does not require attribution, and give Rules of the Road recommendations on how to state the source of the research data.
The problem with copyleft or share alike -licenses is, that they prevent the licensed data being combined with data released under a different license: the derived dataset would not be able to satisfy both sets of license terms simultaneously. Some copyleft licenses, however, demonstrate a small amount of flexibility in allowing derivative works to be released under a compatible license, that is, one that applies approximately the same conditions.
Non-commercial licenses reduce the ways datasets can be used because of ambiguity of what constitutes a commercial use. The EU, the G8 countries and the Finnish government wish that opening up data creates new businesses, growth and employment, and restricting the use of dataset by demanding that they not be used commercially does not allow for these goals to be met. However, if dual licensing is the goal, then allowing use with a non-commercial license and separately licensing commercial uses is the way to achieve this goal.
Below is a selection of standard licenses available, along with reasons for and against using each one. Please note that these licenses can be terminated only by expiry of the licensor’s IPR or, for a particular licensee, through breach of terms.
Creative Commons is a non-profit corporation set up in 2001 for the purpose of producing simple licenses for creative works. These licenses give the creators of such works finer-grained control over how they may be used than simply declaring them public domain or reserving all rights. As well as the legal text, the licenses all have quick clear summaries and a canonical URL for use in HTML, RDF and other code. A rights expression language is also provided for use with RDF. While originally aimed at works such as music, images and video, Creative Commons licenses are widely for most forms of original content, including research data. Over one billion documents have been licensed using Creative Commons licenses
There are six main Creative Commons licenses. Each license includes the Attribution condition. There are three other conditions that licensors can add, and the various possible combinations produce the six licenses. Using just the Attribution condition is known as the CC BY license.
There is a Non-Commercial condition, where commercial is defined as ‘primarily intended for or directed toward commercial advantage or monetary compensation’.
The Share Alike condition inserts a strong copyleft clause into the license. Finally, including the No Derivatives condition (version 4.0) allows these things for private use, but prevents the licensee from sharing the derivations. The six permutations are therefore
Versions of the licenses prior to version 4.0 present problems. The most significant is that the older versions do not cover sui generis database rights in force in the European Union. The version 4.0 licenses, however, do explicitly include sui generis database rights. The 4.0 versions were translated into Finnish by Aalto University project Services (with allocated Ministry of Education funding) as Aalto University is the affiliate organization of Creative Commons in Finland. The older Creative Commons licenses were originally created in Stanford and Harvard, and universities are usually the affiliate organizations in each jurisdiction.
The 4.0 versions were created in global co-operation to ensure that the licenses are well adapted to different jurisdictions. Co-creation legal teams included Aalto University legal team.
CC BY 4.0 license was adopted by Ministry of Finance as the legal tool for opening publicly funded data (JHS 189 and okf.fi/jhs189) and by the Ministry of Education as the license recommended for open access publishing of research data http://avointiede.fi/kasikirja .
The licenses do not distinguish using data as part of a new collection/database from using them to generate content (graphs, models, maps, etc.). This means the Share Alike and No Derivatives conditions greatly reduce possibilities to use data. No Derivatives condition disallows most substantive types of reuse. It should therefore be avoided.
In addition to the licenses, Creative Commons provides the waiver CC0, a tool for waiving all rights without any terms for example attribution is not required.should be used with Rules of the Road, where authors of dataset clarify how they want to be cited, but citing authors is not a condition for reuse.
The following terms for research data can be problematic with FAIR data goals of interoperability and reusability
This text by senior legal counsel Maria Rehbinder is a derivative work based on: Ball, A. (2014). ‘How to License Research Data’. DCC How-to Guides. Edinburgh: Digital Curation Centre, made available under the Creative Commons Attribution 4.0 International license: http://www.dcc.ac.uk/resources/how-guides . Changes were made reflecting Aalto University, Finland and commercial licensing.