Net Traveller: designing metadata course

Showing posts with label designing metadata course. Show all posts

Wednesday, May 25, 2011

Metadata, privacy and information policy

Greetings from the opening of the Meta 2011 Conference at ANU University House in Canberra. Professor John McMillan, the Australian Information Commissioner, is discussing the role of metadata in information policy. He pointed out that metadata may contain information about individuals and so breach their privacy, under national provacy principles, which apply to government agencies and non-government organisations. Organisations need to check the metadata hidden in documents, before they release them (it can be entertaining to see what is hidden away in documents released by government).

Professor McMillan also pointed out that metadata is important to the mechanics of implementing the government's information policy. He held up a copy of the new "Principles on open public sector information", which were released today. (along with a "Report on review and development of principles". It is good to see that the commission released the documents in the form of simple and easy to read HTML files, as well as PDF and RTF. They also put the HTML version first, which will be most useful.

One of the audience asked about intellectual property. The Commissioner replied this was the responsibility of the Attorney General's department, but pointed out this was touched on in the information principles and AGs recommend use of a Creative Commons licence for material to be released to the public.

Here are the eight principles of open government sector information:

Principle 1: Open access to information - a default position

Information held by Australian Government agencies is a valuable national resource. If there is no legal need to protect the information it should be open to public access. Information publication enhances public access. Agencies should use information technology to disseminate public sector information, applying a presumption of openness and adopting a proactive publication stance.

Principle 2: Engaging the community

Australian Government policy requires agencies to engage the community online in policy design and service delivery. This should apply to agency information publication practices. Agencies should:

consult the community in deciding what information to publish and about agency publication practices
welcome community feedback about the quality, completeness, usefulness and accuracy of published information
respond promptly to comments received from the community and to requests for information
employ Web 2.0 tools to support community consultation.

Principle 3: Effective information governance

Australian Government agencies should manage information as a core strategic asset. A senior executive ‘information champion' or knowledge officer in the agency should be responsible for information management and governance, including:

providing leadership on agency compliance with the Information Publication Scheme and Disclosure Log
ensuring agency compliance with legislative and policy requirements on information management and publication
managing agency information to ensure its integrity, security and accessibility
instigating strategic planning on information resource management
ensuring community consultation on agency information policy and publication practices.

The senior officer should be supported by an information governance body that may include people from outside the agency.

Principle 4: Robust information asset management

Effective information management requires agencies to:

maintain an asset inventory or register of the agency's information
identify the custodian of each information holding and the responsibilities of that officer
train staff in information management
establish clear procedures and lines of authority for decisions on information publication and release
decide if information should be prepared for publication at the time it is created and the form of publication
document known limitations on data quality
identify data that must be managed in accordance with legislative and legal requirements, including requirements relating to data security and protection of personal information, intellectual property, business confidentiality and legal professional privilege
protect information against inappropriate or unauthorised use, access or disclosure
preserve information for an appropriate period of time based on sound archival practices.

Principle 5: Discoverable and useable information

The economic and social value of public sector information can be enhanced by publication and information sharing. This requires that information can easily be discovered and used by the community and other stakeholders. To support this objective agencies should:

publish an up to date information asset register
ensure that information published online is in an open and standards-based format and is machine-readable
attach high quality metadata to information so that it can be easily located and linked to similar information using standard web search applications
publish information in accordance with the Web Content Accessibility Guidelines version 2 (WCAG 2.0) endorsed by the Australian Government in November 2009.

Principle 6: Clear reuse rights

The economic and social value of public sector information is enhanced when it is made available for reuse on open licensing terms. The Guidelines on Licensing Public Sector Information for Australian Government Agencies require agencies to decide licensing conditions when publishing information online. The default condition should be the Creative Commons BY standard, as recommended in the Intellectual Property Principles for Australian Government Agencies, that apply to agencies subject to the Financial and Management Accountability Act 1997. Additional guidance on selecting an appropriate licence is given in the Australian Government Open Access and Licensing Framework (AUSGOAL).

Principle 7: Appropriate charging for access

The FOI Act requires agencies to facilitate public access to information at the lowest reasonable cost. This principle applies when information is provided upon request or is published by an agency. Other Acts also authorise charges for specific documents or information access.

Agencies can reduce the cost of public access by publishing information online, especially information that is routinely sought by the public. Charges that may be imposed by an agency for providing access should be clearly explained in an agency policy that is published and regularly reviewed.

Principle 8: Transparent enquiry and complaints processes

Agency decision making about information publication should be transparent. This can be supported, within the agency's information governance framework, by an enquiry and complaints procedure for the public to raise issues about agency publication and access decisions. The procedure should be published, explain how enquiries and complaints will be handled, set timeframes for responding, identify possible remedies and complaint outcomes, and require that written reasons be provided in complaint resolution. ...
From: "Principles on open public sector information", AOIC, 25 May 2011

Publishing BBC Metadata on the Web

Greetings from the opening of the Meta 2011 Conference at ANU University House in Canberra. Tom Scott, from the BBC is the first speaker, on Publishing BBC Metadata. Tom mentioned the Semantic Web in his first few words. He asked "What is the web?", showing Tim Berners-Lee's original paper "Information Management: A Proposal" ( CERN, March 1989).

Tom demonstrated the BBC Nature website, which in addition to ordinary web pages, provides structured data, using RSS and RDF and semantic mark-up using microformats. This data is available for others to use and is also used by the BBC to create new stories.

Tom also mentioned dbpedia, an attempt to structure Wikipedia data. At this point he argued that there is no metadata and what is commonly though of is data is actually metadata. In a reference to Stephen Hawking, Tom said "Turtles all the way down". This is an metaphor for infinte recursion, however, I would argue it is "metadata all the way down". James Gleick argues in his book "The Information: A History, a Theory, a Flood", that the ability to reason abstractly came after writing ("if all horses are white ..."). That seems unlikely, as I am sure horse breeders reasoned on the nature of a good horse, before written language. Data and metadata are intertwined by their nature, not due to a human invention.

Tom argued that we needed to move from the document web to the data web, the web of things, which is what the semantic web is for. However, after spending many years trying to understand the semantic web and teach it to university students (supervising several masters students doing project on using it for cataloguing indigenous cultural material), I think this is a concept which needs to be further refined and simplified to be widely used. Tim Berners-Lee's key contribution with the World Wide Web was to take an existing complex electronic document standard (SGML) and simplify it to make something easy enough to use (HTML). Ever since, information professionals have argued that HTML is flawed, some tinkered with SGML and produced XML, others tinkered with HTML to make XHTML, but lost was the simplicity of HTML In my view the semantic web similarly needs simplification, even if the purists then say it is incomplete.

Tom then explained that the BBC use metadata for program guides. The importance is not the metadata but the information it describes. This is the key point which information professionals tend to find so obvious, that they forget to explain. While they may say metadata is data about data, but do not say why this is useful. That is a topic I will explore in my talk to the conference tomorrow, with Senator Lundy on "Designing for Democratic Dialogue: More than Mating iPads" (11.00 am on Thursday 26th May, 2011).

Next on the program today we have Greg Stone, Chief Technology Officer, Microsoft Australia and Professor John McMillan, Australian Information Commissioner, who is launching the new government information policy.

Monday, May 16, 2011

Australian Principles on Open Public Sector Information

Australian Information Commissioner, Professor John McMillan AO, will formally launch the Australian "Principles on Open Public Sector Information" in Canberra on 25 May 2011. This will be during the Meta 2011 Conference, at University House, the Australian National University. I have been teaching the draft principles to students in the ANU course COMP7420 and am speaking along with with Senator Lundy at the conference the next day on "Designing for Democratic Dialogue".

Institute of Metadata Management
Press Release
Contact: Michele Berkhout
Phone: +61 415 875 132
E-mail: press(a)metalounge.org
13 May 2011
Information Commissioner to launch
“The Principles on Open Public Sector Information” at Meta 2011
The Institute of Metadata Management is delighted to announce that the Australian Information Commissioner, Professor John McMillan, will be formally launching the Principles on Open Public Sector Information at Meta 2011 to be held at University House, ANU, Canberra, from Wednesday 25th May 2011 to Friday 27th May, 2011.
“The Open Public Sector Information Principles present a core vision for government information management in Australia. They mark a shift in thinking about public sector information. There is now greater recognition that government information is a national resource that should be published for community access and use,” Professor McMillan said.
“The Principles set out the central values of open public sector information: that it be freely available, based on open standards, easily discoverable, understandable, machine-readable, and freely reusable and transformable.” As well as the formal launch there will be a practical demonstration of the benefits of the Principles in a workshop jointly facilitated by the Commissioner’s Office and linked data experts, Dr Armin Haller from the CSIRO and Dr Tudor Groza from the University of Queensland. This workshop will clearly show how, by adopting best practice in the publication of data according to the Principles, the power of linked data and semantic technologies can be leveraged to provide new insights and business intelligence for government, business and the community.
IMM President Mel Taylor is delighted with the collaboration between the IMM and the OAIC as it demonstrates the joint objectives of promoting both the capability and expertise around better information management within the Australian government and, in particular the Metadata community of practice more broadly.
Meta2011 is the fourth conference to be held in Australia and is themed Business Realities and Implications. The 3 days conference will cover topics such as Business intelligence and analytics; Technology solutions; Data integration; Management, governance and stewardship and more.
By attending this conference, delegates are expected to go away with knowledge which can be shared and implemented in their businesses. Registration is still open and there are a range of sponsorship opportunities available. Contact us here to find out how to be part of the IMM community.
For more information related to IMM and Meta2011 please visit http://www.metalounge.org

Sunday, July 12, 2009

Designing a course module in Metadata and Electronic Data Management - Part 3

Having the general direction for the course module on Metadata and Electronic Data Management, what should the students be able to do at the end of the IT in e-Commerce course? The numerous seminars on how to design courses I have attended over the last year have emphasised the importance of learning objectives and of assessment as part of the learning process. This is not just about setting a test at the end to see the students can remember things.

In order to prepare some Learning Outcomes, I did a web search for other courses on metadata and document management to see what they had. The first found was the University of Manchester's "COMP30352: Information Retrieval, Hypermedia and the Web", however this seems more of a web course. The second found was "IT in E Commerce COMP6341" at the ANU. It took me some moments to realise this was the course I was teaching. Someone had already written the learning outcomes:

Learning Outcomes:
The focus of this course is on document representation, knowledge discovery, storage and retrieval, and electronic trading. The areas covered include XML, XSL, DTD, metadata, data management and different forms of trading such as deliberative, spontaneous and auctions. Other topics will be included to match recent developments and maturation of the area, such as web application frameworks, web services and the semantic web Rationale Electronic Commerce is an area that is growing in leaps and bounds. The use of information technology is at the heart of electronic commerce. It is important that students doing a degree in Information Systems have a sound understanding of the role that information technology plays in electronic commerce. This course, along with the course on Internet, Intranet and Document Systems, is meant to do just that. It looks at some of the current and potential uses of information technology in electronic commerce. The topics covered include document representation in the form of XML, XSL, DTD's; knowledge discovery using metadata and data mining; data management as in the case of Digital Libraries and Electronic Document Management; trading, including deliberative, spontaneous and auctions; and security (public keys, PKI, digital signatures, etc). Other topics would be included as the area matures. It is anticipated that this course will be of interest to people in the industry as well.

This course is responsible for:

current trends in representation of data and documents on the web
knowledge discovery in the form of metadata and data mining
database management in electronic commerce
electronic trading
security in electronic commerce.

The following topics will be addressed:

knowledge representation - XML, XSL, DTD, CSS
knowledge discovery - metadata and data mining.
data management - digital libraries and electronic document management
trading - deliberative, spontaneous and auctions
security - public keys, symmetric keys, PKI, authentication, digital signatures, etc.

Upon completion of this course, the student will be able to do the following:

Describe the XML language, write simple DTD's, write CSS style sheets for documents, and explain where XML can be applied to advantage and why.
Describe the use of metadata, and describe the current trends in data mining.
Describe how digital libraries and electronic document management work.
Describe the different kinds of trading that an individual, or an organisation, can do electronically. Explain the advantages and limitations of electronic trading, and the risks involved.
Explain why security is such a big issue in electronic commerce and how it is being addressed. Describe key concepts like public keys, symmetric keys, PKI, authentication and digital signatures. Given a system specification, come up with a design that allows secure transmission of information.
From: "IT in E Commerce COMP6341", Course Details, ANU, 2009

The last part which is of interest, saying what the student should be able to do on completion of the course:

Describe the XML language, write simple DTD's, write CSS style sheets for documents, and explain where XML can be applied to advantage and why.
Describe the use of metadata, and describe the current trends in data mining.
Describe how digital libraries and electronic document management work.
Describe the different kinds of trading that an individual, or an organisation, can do electronically. Explain the advantages and limitations of electronic trading, and the risks involved.
Explain why security is such a big issue in electronic commerce and how it is being addressed. Describe key concepts like public keys, symmetric keys, PKI, authentication and digital signatures. Given a system specification, come up with a design that allows secure transmission of information.

The wording of this is curiously loose, for example "...why security is such a big issue ...". Also use of the term "describe" seems too passive for a IT course, which should be about being able to do things, not just describe them.

Describe the XML language, write simple DTD's, write CSS style sheets for documents, and explain where XML can be applied to advantage and why.
Describe the use of metadata, and describe the current trends in data mining.
Describe how digital libraries and electronic document management work.
Describe the different kinds of trading that an individual, or an organisation, can do electronically. Explain the advantages and limitations of electronic trading, and the risks involved.
Explain why security is such a big issue in electronic commerce and how it is being addressed. Describe key concepts like public keys, symmetric keys, PKI, authentication and digital signatures. Given a system specification, come up with a design that allows secure transmission of information.

Extracting the items relating to metadata and electronic document management:

Describe the use of metadata ...
Describe how digital libraries and electronic document management work.

A better way to put this may be:

Use the XML language to define document strutures
Use XSLT to transform documents and CSS to present them
Use metadata to describe documents for use in digital libraries and electronic document management

In the course I previously spent a lot of time describing how e-publishing systems worked in general, and the history of publishing, to provide a context for XML based publishing. This is of little interest to current day students of IT, to whom paper publishing and library card catalogues are not part of their experience, having been born after e-publishing and computer catalogues had become the norm.

Also I spent a lot of time saying what was wrong with PDF. While there is still much wrong with PDF, there seems little point in spending time on that, when instead alternatives could be presented. Otherwise this is much like presenting what is wrong with private cars and roads to transport engineers.

Some other parts of the course can be emphasised. As an example the IFIP Digital Library which was speculated about last year has now become a reality, with the ANU providing the system for users around the globe. It is unlikely that students will have much interest or understanding of the idea that the material in the digital library was once available primarily on paper. They may also have difficulty making the connection between the digital library and the buildings on campus which are still called a library. The lower floors of these buildings have been cleared of most paper, to provide space for computer access, with perhaps a few serials and new books on display as historical curiosities.

Designing a course module in Metadata and Electronic Data Management - Part 2

Having worked out how much material is needed for a course module on Metadata and Electronic Data Management, what exactly is it for? The description of the IT in e-Commerce course refers to: "... document representation (XML, XSL, DTD, CSS), knowledge discovery (meta-data, information retrieval), data management (digital library, electronic document management), trading (spontaneous, deliberative, auctions) and security (encryption, public key, symmetric key, PKI, authentication, etc). ..."). So the course is about how to design e-documents, protect and manage them, so that they can be found and used for transactions in business.

The ANU is in Canberra, the seat of Australian Government and many students work for the government and so many of the examples in the course are drawn from government business. Also because some of the students go on the be academics and researchers, the example of academic publishing has been used as an example.

There are some common problems for people in business, government and academia: how do I create an e-document which will be flexible for use by different people at different times? How can it be kept? How can it be found? How can it be authenticated?

The problem with e-documents is coping with the volume of material. Workers are being overwhelmed with the volume of email and attachments. Just as they get used to e-mail, along comes blogs, wikis, twits and other technologies to cope with.

The course teaches the use of XML based technology. The idea is that you create the documents in a format which reflects the information content, separate from how the document will look to the reader. This goes beyond the separation of structure from presentation for web pages. With a HTML document, if you strip off the presentation layer, the document still looks like a text document. However, with XML data, with the data definitions removed, you have just a jumble of letters and numbers.

The key point in terms of knowledge discovery is metadata. The metadata can be used to find the data and also substitute for it in many processes. In the case of XML documents metadata is also used to define the data structure.

Students have considerable difficulty understanding what metadata is. The popularisation of metadata trough Tags on web resources, such as images, blog postings and instant messages, provides a useful example.

Previously I introduced metadata from the technical point of view and then illustrated it with popular examples such as Tags. Perhaps it might be to reverse this and introduce tags first.

In introducing electronic document management I went into considerable detail about the procedures used by the Australian Government. While this was popular with professional records managers and archivists, it was of little interest to IT students. It also seems a loosing battle in the government with such records management systems falling into disuse. While I can't solve the problems of the government by myself, perhaps I can suggest some different techniques to the students.

Deleting most of the material about records management procedures will make room for some new material on new electronic formats for use by business.

Saturday, July 11, 2009

Designing a course module in Metadata and Electronic Data Management

How do I create a course module on "Metadata and Electronic Data Management"? This year I have again been asked to help teach students in the course Information Technology in Electronic Commerce (COMP3410) at ANU.

The content will be much the same as last year, but I would like to package the material up more neatly. This is partly prompted by my resolution last year that I had given my last lecture. Also the material currently lacks a coherent theme as is much longer than it should be. In addition I would like to revise some of the material which is based on old EDI standards and old Australian government records management guidelines.

How much?

But where to start? The first step is to get some idea of how much material is required. Previously I gave about five or six lectures and a lab covering the material. This equates to about two weeks of a course.

Last years notes for the course are the equivalent of 36 A4 pages, or about 18 pages per week. At one end of the spectrum my notes for Green ICT Strategies (COMP&310) are about 3 A4 pages per week, whereas the web technology lectures for COMP2410/6340 - Networked Information Systems are 24 pages per week. This range can be accounted for by the Green ICT course being at the masters level and assuming the student does more independent reading. Also the Green ICT notes are mostly English text, whereas the web technologies notes contact examples of code, which take up more space. So at 18 pages per week, the metadata and data management notes seem about right, but perhaps could be trimmed a little.

Where does it fit in the skill set?

The Metadata and Electronic Data Management materials was just whatever I thought might be relviant, when first presented in 2000. It was designed to fit with what else was included in the course and related courses, but no thought to how it fitted in the career of the people who were being trained.

To position the Green ICT Strategies course, the Skills Framework for the Information Age (SFIA) was used. A search of SFIA found only one Skill definition which mentioned metadata, which was Information management (IRMG) :

The overall management of information, as a fundamental business resource, to ensure that the information needs of the business are met. Encompasses development and promotion of the strategy and policies covering the design of information structures and taxonomies, the setting of policies for the sourcing and maintenance of the data content, the management and storage of electronic content and the analysis of information structure (including logical analysis of data and metadata). Includes overall responsibility for compliance with regulations, standards and codes of good practice relating to information and documentation records management, information assurance and data protection. ...

From: Information management (IRMG) , Strategy & planning, Information strategy, SFIA, Version 3, 2005

For the undergraduate version of the course this would be at SFIA level 4 and Level 5 for the postgraduate version. The higher SFIA level has more management and less technical responsibility.

A search of SFIA for "data management" turned up reference in Business analysis (ANAL), System software SYSP and Enterprise architecture STPL. None of these seem to fit with the intended content, the closes is business analysis, but that has too much business and not enough technology.

A search of SFIA for "records management" turned up the Information management (IRMG) skill again.

A search for "publishing" found Information content publishing ICPM, but this seems to relate more to web design.

So of all these Information management (IRMG) seems most relevant.

Metadata and data management for governance

Looking at the higher level, IM is in the SFIA Subcategory of Information strategy. This also includes the Corporate governance of IT (GOVN). At first glance governance does not seem relevant to metadata and data management, being more for a course on IT project management.

However, many of the examples I use to explain the uses of metadata and data management from government and involve the keeping of records for demonstrating that an organisation is being properly run. It occurred to me that it might be useful to turn around the emphasis on record keeping in case you are taken to court, to instead start by looking at what is needed in terms of electron communications and documents for running an organisation well at the highest level, that is governance. With this I could start off with the principles of governance and then show how to make effective use of tools like instant messaging and blogs in a corporate environment.

Wednesday, May 25, 2011

Metadata, privacy and information policy

Principle 1: Open access to information - a default position

Principle 2: Engaging the community

Principle 3: Effective information governance

Principle 4: Robust information asset management

Principle 5: Discoverable and useable information

Principle 6: Clear reuse rights

Principle 7: Appropriate charging for access

Principle 8: Transparent enquiry and complaints processes

Publishing BBC Metadata on the Web

Monday, May 16, 2011

Australian Principles on Open Public Sector Information

Institute of Metadata Management

Press Release

Information Commissioner to launch“The Principles on Open Public Sector Information” at Meta 2011

For more information related to IMM and Meta2011 please visit http://www.metalounge.org

Sunday, July 12, 2009

Designing a course module in Metadata and Electronic Data Management - Part 3

Designing a course module in Metadata and Electronic Data Management - Part 2

Saturday, July 11, 2009

Designing a course module in Metadata and Electronic Data Management

Information Commissioner to launch
“The Principles on Open Public Sector Information” at Meta 2011