Thursday, January 06, 2011

RESTful Statistical Service Powered by R

Greetings from the SyPy: Sydney Python Users group meeting at Google Sydney. Richard Volpato is talking on "Making R restfully enterprising with Django". Richard works for the Copyright Agency Limited, which has a lot of statistical data to process and is important to Australia's authors. He created a RESTful 'statistical service' powered by the statistical language R. Richard pointed out that the open source R language could be used in a similar way to statistical packages SPSS and SAS. This could be more significant to the Austrlaian publishing industry than the government's Book Industry Strategy Group (BISG) inquiry.

To prevent the large statistical analyses overloading the server, they are queued (reminding me of old fashioned batch processing). The queuing is done using RabbitMQ. Then RESTful is used to interface to the clients (simpler than SOAP). The Django web framework was used with Python to tie all the components together.

I had difficulty understanding the technical details of Richard's explanation of how the system works, as not I am a Python user (I felt he was giving the equivalent of five conference talks at once). This reminded me a little of LaTex for typesetting, but applied to statistics.

However, the issue of having to carry out complex and very precise calculations on large amounts of data which effects large payments is familiar to me from work at the
Commonwealth Schools Commission to calculate billions of dollars in payments to schools.

What was exciting was Richard's broad vision of how such systems could be used for corporate and public benefit. He mentioned
Google's Content Identification and Management System which allows rights owners to "Choose, in advance, what they want to happen when those videos are found. Make money from them. Get stats on them. Or block them from YouTube altogether...".

One interesting comment was about how important the payments from the Copyright Agency were to Australian authors.

Richard then talked about how a
Kanban Board is used for the software development.

Richard will be at John Maindonald's Statistical Learning and Data Mining using R course at ANU in Canberra 17 to 21 January 2011.

No comments: