Monday, December 11, 2006

Office Open XML Ecma Standard

On 7 December 2006, Office Open XML (OpenXML) was adopted as Ecma standard 376. Ecma has also submitted it for fast track adoption by ISO (IEC JTC 1). OpenXML is based on Microsoft's Office Open XML and is an adaption of Micrsoft Office's word-processing , presentation, and spreadsheet formats to XML. It is similar to Sun Microsoft's OpenOffice format, which has already been adopted as an ISO standard. The Wikipedia also has a useful overview of Office Open XML and comparison with OpenOffice.

Unfortunately while ECMA's announcement says their documents can be downloaded from their web site, I was unable to find the approved standard 376 in the list. But presuimbly the standards is close to the final draft of 9 October 2006. In addition there is a overview by the ECMA committee.

The standard is divided into five parts:
  1. Fundamentals
  2. Open Packaging Conventions
  3. Primer
  4. Markup Language Reference
  5. Markup Compatibility and Extensibility
The standard is provided in PDF Tagged PDF and "WordprocessingML" formats (WordprocessingML is the OpenXML word processing format). The document is not provided in HTML format as ordinary web pages, which will severely limit access to it.

Like OpenOffice, OpenXML uses the zip format to bundle up the text of a document in XML format with any images and other binary files into a compressed file. As an example the "Fundamentals" section of the standard in OpenXML format is one 240 kbyte ziped file. When unzipped it contains 29 files, of a total of 2.4 mbytes: three PNG images and the rest XML. The main text of the document is in one 1.6 mbytes file ("document.xml"), with various formatting and references in other small files.

Assuming the IT community accept Microsoft's assurances that they will continue to make use of the format freely available, it should prove popular. However, neither OpenXML nor OpenOffice are compatible with a web browser and face their biggest challenge from web standards. After an author prepares a document using OpenXML or OpenOffice they most likely then have to render it other formats for distribution, such as PDF and HTML.

Newer XHTML standards are providing more of the formatting expected for word processing documents, while providing backward compatibility with web browsers. A word processor which use an XHTML format as its native format would provide the capability of simply saving the document to the web for distribution. There would be no need to convert to PDF or HTML. There would also be scope for better integration with web tools, such as blogs, wikis and feeds.

The creation, promotion and distribution of a new word processing package was previously a major undertaking. However, AJAX (Web 2) based office packages could quickly render irrelevant the debate as to if OpenXML or OpenOffice is better, by superseding them both.

Ecma's overview of OpenXML illustrates both the strengths and weakness of both its approach and that of OpenOffice:
"OpenXML was designed from the start to be capable of faithfully representing the pre-existing corpus of word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Corporation. The standardization process consisted of mirroring in XML the capabilities required to represent the existing corpus, extending them, providing detailed documentation, and enabling interoperability. At the time of writing, more than 400 million users generate documents in the binary formats, with estimates exceeding 40 billion documents and billions more being created each year."
This is a wrong headed approach to the creation of an electronic document standard. The priority for word processing documents has been to reliably produce printed documents which look identical. However, the production of printed documents is now a very small part of what a word processor is used for and should not be the priority. Most documents are used for on-screen electronic viewing. Exact reproduction of a printed format is exactly what is NOT needed. As a result word processing documents have to be converted into other formats for use. As an example, the OpenXML standard is provided in three formats: PDF for printing, Tagged PDF for on-screen viewing and WordprocessingML. None of these formats is particularly suitable for on-screen viewing.

A new approach is needed where the document format is designed for on-screen viewing with a web browser, and then the additional features needed for printing are added. This can be done with XHTML.

No comments: