Monday, November 14, 2011

Evidence Based Information Prvacy

Greetings from the Australian National Unviersity in Canberra, whereProfessor Brad Malin, Director, Health Information Privacy Lab , is speaking on Risk-based Privacy: What are we Afraid Of? This is topical with the Australian Government working on a Cyber Policy white paper.

Brad is in Australia to work on an Australian Research Council funded project with Jiuyong Li at University of South Australia: "Studying Privacy Protection Methods for Multiple Independent Data Releases" (Grant DP110103142, 9/2011 - 8/2013).

He started by discussing the case where Latanya Sweeney was able to find the medical records of the then Governor of Massachusetts, in a set of supposedly anonymous research records. The problem is that there are legitimate reasons to provide limited information about individuals in data sets for specific purposes, but this can then be combined with other data to identify the individuals and used for purposes it was not intended for. Typically concern is about companies misusing customer data, bit the same can occur with the citizens own government, or another government.

Brad then discussed the 1996 Health Insurance Portability and Accountability Act (HIPAA) This allows health data to be released for scientific purposes, if the identity of the patients had been removed from the records. A set of guidelines were developed to make the data anonymous. However, this requires a very complex statistical analysis, which does not give a guaranteed result. An analysis of data across US states found that up to 60% of the population was at risk of having their data identified. The difference between two ways to de-identify the data can result in more than a thousand times the risk of identification.

Brad discussed the implications of this research. One response would be to say that it is not possible to provide privacy in the modern Internet world and we should give up on the idea. Brad does not support this, arguing that privacy depends on context. He used the example of the eMERGE Network DNA research database. On its own the DNA information does not identify individuals. However, it occurs to me that research is identifying what DNA causes particular deceases and physical characteristic. This may allow identification in the future.

Brad pointed out that the demographic data available about individuals varies from US state to state (it appears this data is provided by the state government). He emphasized that cost was a factor in re-identifiability, as some states charge for the data. However, it coursers to me that the "open access" movement will reduce these costs. There is a movement in the USA and Australia lobbying for free access to information collected at government expense.

Brad's latest analysis indicates that the risks from re-identification is small in reality. However, the "Safe Harbor" guidelines are not sufficient for this.

Brad's analysis was about privacy for the general population. However, there is also a security risk for individuals of particular interest. So while an analysis might find only a few individuals are at risk, if those who are key to a company or government (or as senior military officers) then the risk may be unacceptable. Also I found the Model Consent Language for eMERGE.

Brad pointed out that there is no certification process for the experts who are needed to certify the data release under the US HIPAA law. It occurs to me that this something which a course and test could be developed for.

In Australia, simialr expertise is needed to meet privacy laws, but those laws are much less prescriptive. I wrote Electronic Document and Records Management for the ANU course COMP7420. There is a companion course Data Mining and Matching (COMP7410). Perhaps these could be included in a certification.

