Greetings from Australian Computer Society, Canberra Branch Forum where Kent Fitch talking about Scaling up: the technology behind the NLA's newspaper digitisation and the Trove search service.. He started by showing a newspaper article by Caroline Chisholm "An Australian Bush Scene". As well as searching the NLA's catalog and digitized newspapers and other on-line sources. Kent pointed out that any member of the public can correct the OCR text from the newspaper, add tags and comments. This has proved popular with tens of thousands of Australians. Here is a search of Trove for "Tom Worthington". I notcied NLA have issued a Persistent Identifier for me: http://nla.gov.au/nla.party-1017487
Kent mentioned that the NLA has archived copies of important Australian web sites, including that for the Sydney 2000 Olympics. This has proved useful to me as I can use it to teach my students about web accessibility.
NLA are using the open source Lucene search engine and Solr service wrapper for Trove. NLA uses commodity servers, with Internal Solid State Drives, which are essential to the operation of the system, due to the many searches each query generates.
One flaw in the NLA's setup may be that it only has offices in Canberra. Loss of access to Canberra will therefore cut off access to Trove and perhaps result in the loss of all data contained in it permanently. This is not ideal for national cultural heritage and at least one live hot site and one remote backup, at least 300 km from the NLA building, should be used.