Tuesday, July 10, 2007

Clustering TDB right answer to wrong question

Andrew Tridgell from IBM OzLabs gave an impressive live demonstration of Clustering TDB on Monday. He claims that the clustered version of Samba will allow for very large servers to be built to serve Microsoft Windows PCs as well as Linux computers. These can service 30,000 PC users with one server machine having 100 processor nodes. The server could have 10,000 disk drives and hold millions of gigabytes of data (petabytes).

The bit I didn't understand was why anyone would want such a server. If you have a very large number of users running from the one server in an organization, it is likely they are mostly running the same small set of applications. In this case it would be far more efficient to run those applications on shared processors, than on desktop PCs. This would likely result in a 90% saving in hardware.

In the extreme case if 30,000 desktop PCs are running the same application, there will have to be 30,000 copies of the same software loaded from the server and installed in 30,000 sets of RAM. Ideally, if a shared processor was used, only one copy of the application would need to be loaded in one set of RAM. In reality several copies would be needed and each user needs some RAM for their own data, but this could still save 90% of the data traffic and RAM.

One reason to use PCs is to be able to use Microsoft Office and other windows applications. It is possible to run these from a remote server and use software such as Citrix to provide it to a desktop. An example of this is the new State Library of Queensland, which uses Citrix for some reader machines.

A better approach would be to use applications which were designed to run efficiently in a shared environment. One way to make this palatable to users would be to offer them Web based applications which they could use remotely. While these would have a less interactive interface than PC applications, they would have the advantage for the user of being available anywhere there was a web browser available. They would also work well with the Web 2.0 collaborative idea, allowing staff to share information more easily.

The advantage for the technical staff would be that they would be extremely efficient in processor and memory use. In addition web based applications using efficiently encoded data will need less storage space. The 100 node 10,000 disk PC system may only need 10 processors and 1,000 disks if implemented with a web based interface.

No comments: