Tuesday, September 04, 2012

ANU Data Management Interest Group

Greetings from the inaugural meeting of the ANU Data Management Interest Group, at the Australian National University House in Canberra.

ANU Data Commons

ANU is building an ANU Data Commons for research data. Such a repository is useful for maintaining data beyond on research project, while allowing the researcher to retain control of their data. This work comes out of the Australian National Data Service (ANDS).

A data repository is normally thought of as something the researcher manually deposits data into manually. However, scientific instruments commonly now have network connections and can log data automatically, directly into the repository.

The project is based on Fedora Commons Repository Software, which is widely used by academia and public institutions (not to be confused with the Fedora implementation of Linux). ANU is adding a web interface for Fedora Commons. The system uses Dublin Core type metadata descriptions.

It occurs to me that research projects are typically not confined to one institution and cooperation is useful. My suggestion would be to rename the project the International Data Commons, providing services to the research community, in the same way the ANU hosts supercomputer facilities and archives for the research community.

ANU Archives

The ANU Archives holds University Archives and of business, trade unions, professional associations and industry bodies, in Australia and the region of interest to researchers. Notable collections are the Noel Butlin Archives Centre, Pacific Research Archives, Map Collection, Photographs and Publications. The archive is computerizing its metadata and providing this to the National Library of Australia Trove Service. One of the quirky but useful collections the ANU Archive has is the records of hotel bars in NSW.

Making Ocean Data More Usable

Various local, national and international research bodies have oceanographic data repositories. The International Council for Science (ICSU) formed the Scientific Committee on Oceanic Research (SCOR) to help link up this data. Dublin Core standards were expanded to accommodate oceanographic needs. Details are in "Workshop on Data Publication" (SCOR/IODE/MBLWHOI Library, Workshop Report No. 230), which includes a MBLWHOI repository workflow and a Micro Life Cycle of Data.

Australian National Data Service

The Australian National Data Service (ANDS), is an Australian Government funded project, to help make research data available for long term use. As well as the Research Data Australia Discovery Service to find data, ANDS works on policy development and education, in particular on Ethics, consent and data sharing.

Australian Governments Open Access and Licensing Framework

The Australian Governments Open Access and Licensing Framework (AusGoal), provides guidance on use f open access licenses by Australian Governments. AusGoal is managed by the Cross Jurisdictional Chief Information Officers' Committee (CJCIOC), supported by the federal Department of Finance. AusGoal's approach provides a model for researchers to license data.

ANU Data Management Courses

The ANU offers research students a Data Management Workshop (ILDM01). There is also a fifty page ANU Data Management Manual (Managing Digital Research Data at the Australian National University, 2012).

Australian Data Archive

The Australian Data Archive (ADA) grew out of the Social Sciences Data Archive, using the national data faculties hosted at the ANU. The New OAIS architecture, is shown in "The Australian Data Archive: preserving the past for access in the future" by Dr. Steve McEachern
(Slide 9, COMPASS seminar series, March 2012). The ADA home page gets a W3C mobileOK Checker 80% score (which is very good) but the metadata does not appear to be exported to Trove.

ANU Research Data Management Policy

ANU's research data management policy is set out in sections 2.1 to 2.4 of the "Responsible Practice of Research Policy" (ANU, 22 November 2010).

Digital Humanities Hub Data Management

The ANU Digital Humanities Hub has an Online Cultural Collections Analysis and Management System (OCCAMS), as described in "The Data Management Journey". The The AUSTLANG system combines data on Aboriginal and Torres Strait Islander languages.

Data Citation

Digital Object Identifiers (DOIs) are used for identifying scholarly papers and in the citation of the papers in others. DOIs and similar identifiers can be used for identifying research datasets and used to cite data. Datasets need not be open to be cited, the metadata acts as a proxy for data which can not be publicly navigable due to privacy, commercial or national security reasons. ANDS provides an easy visual overview with "Building a Culture of Data Citation".

Adopting the Internet Process for Data Repositories

At present the process of integration of data is ad-hoc and lacking in a strategic approach. It occurs to me that researchers could look to the Internet for some techniques for accelerating standardization and use. While the technical details underlying the Internet are well know, what is not appreciated is the international political process used to create and promote the standards. This process cut trough much of the red tape which held up the development of data network standards and as great a contribution to the Internet success as the technology itself.

The creators of the Internet and the web founded a number of organizations to foster its development, promotion and standards development. On 28 August 2012, several of the ordinations announced an agreement with the IEEE on standards development, in "Leading Global Standards Organizations Endorse ‘OpenStand’ Principles that Drive Innovation and Borderless Commerce" (IEEE, IAB, IETF, Internet Society and W3C, 28 August 2012).

I suggest that those interested in data repositories for research should adopt the political processes used successfully for the Internet, to develop standards and promote access to data. The pre-Internet processes currently being used will not be sufficient. It should be noted that Australia was closely involved in the development and implementation of the Internet and much of that actively took place in Australian universities, particularly the ANU. That expertise is available for advising on data repository promotion.

One way to promote data reuse is to consider the motivations of those involved. Researchers are motivated by funding and publication considerations. If data reuse is a criterion used by finding bodies to decide if research gets funded and if data repositories can be counted as publications and use of the data counted as a publication citation, then this will be a powerful motivation to do the extra work to make data available.

Academics tend to take a roundabout way to approach a topic. I suggest that if data repositories are to take hold a much more brief, direct and results orientated approach needs to be taken. Those promoting data repositories need to succinctly explain how to make a dataset cite-able and how to cite it, give successful example and explain the benefits in terms of better research metrics and grants. It is not necessary, nor desirable, to preference this with a long scholarly history of data management.

1 comment:

Tom Worthington said...

Presentations from the inaugural meeting of the ANU Data Management Interest Group, September 2012, are now available on line.