Showing posts with label speech to text. Show all posts
Showing posts with label speech to text. Show all posts
Sunday, April 12, 2020
Auto visual content for video conference presentations
Many dull video conference presentations are now taking place around the world. Transmitting the video of the speaker is consuming bandwidth for no good purpose. Occasionally
when chairing conferences I search for online content based on what the
presenter is saying, as they are saying it, and put that on screen
behind them for the audience to see. This could be added to video conference systems to enliven dull presentations. A text to speech system would provide content for searches. The Vidnami video editing system already searches out content based on a script, but not in real time.
Friday, December 11, 2015
How do people ask for information?
Greetings from the famous room N101 at the Australian National University in Canberra, where Professor Douglas Oard, from University of Maryland, is speaking on 'Building Search Engines for the “Bottom Billion”'. The example he gave was of an Indian farmer who had some spots on their crop and wanted help in finding out what it was. The intended interface is a mobile "feature phone".
Professor Oard first described work by IBM India using audio information indexed by numbers which was accessible via a mobile phone. He commented this would not scale and non-literate users do not understand the concept of a hierarchical menu. He then described a method for matching audio called "Query by babbling" (Oard, 2012).
Finding a way to provide information to those with limited literacy is a worthwhile research area. However, there is extensive social science research on verbal communication which the researchers might want to review for insights into how people formate queries. I have come across some of this literature as a student of education, such as Venkataraman and Prabhakar (2014).
Also it happens I spent three weeks in an Indian village and had to communicate with the plumber, the miller and other trades. One insight from this is that they do not use just one language, they use a mix. Also hand gestures play a role.
Smart phones are becoming increasingly affordable and this might provide the opportunity for a visual and audio interface. This interface might also have an educational function, displaying words and teaching written communication, while providing a verbal interface.
Audio interfaces are also useful for those who are literate, but not able to use their hands for typing and do not have sufficient attention to compose a formal audio text query. This applies in formal teams, such as those controlling a metro, military command centers, and flight deck crews. These have been extensive studies of how these people communicate. Part of this is about the formal use of language for directed commands and queries. But part of it is the team members overhearing conversations and acting without being directly asked to do so. Automated systems could be good in this role, where they could eavesdrop on the human conversation, responding and taking action, where appropriate.
Venkataraman, B. & Prabhakar, T. (2014). Changing the Tunes from Bollywood’s to Rural Livelihoods — Mobile Telephone Advisory Services to Small and Marginal Farmers in India: A Case Study, in Ally, M., & Tsinakos, A. (2014).
Professor Oard first described work by IBM India using audio information indexed by numbers which was accessible via a mobile phone. He commented this would not scale and non-literate users do not understand the concept of a hierarchical menu. He then described a method for matching audio called "Query by babbling" (Oard, 2012).
Finding a way to provide information to those with limited literacy is a worthwhile research area. However, there is extensive social science research on verbal communication which the researchers might want to review for insights into how people formate queries. I have come across some of this literature as a student of education, such as Venkataraman and Prabhakar (2014).
Also it happens I spent three weeks in an Indian village and had to communicate with the plumber, the miller and other trades. One insight from this is that they do not use just one language, they use a mix. Also hand gestures play a role.
Smart phones are becoming increasingly affordable and this might provide the opportunity for a visual and audio interface. This interface might also have an educational function, displaying words and teaching written communication, while providing a verbal interface.
Audio interfaces are also useful for those who are literate, but not able to use their hands for typing and do not have sufficient attention to compose a formal audio text query. This applies in formal teams, such as those controlling a metro, military command centers, and flight deck crews. These have been extensive studies of how these people communicate. Part of this is about the formal use of language for directed commands and queries. But part of it is the team members overhearing conversations and acting without being directly asked to do so. Automated systems could be good in this role, where they could eavesdrop on the human conversation, responding and taking action, where appropriate.
References
Oard, D. W. (2012, November). Query by babbling: a research agenda. In Proceedings of the first workshop on information and knowledge management for developing region (pp. 17-22). ACM.
Subscribe to:
Comments (Atom)