Editor’s Note: Adrien Schmidt is an internationally recognized entrepreneur, engineer, and innovator. In 2015 he co-founded Aristotle by Bouquet.ai, an enterprise software company that delivers a personal voice analytics assistant to convert data analytics into meaningful conversation. The thoughts and opinions expressed in this commentary are his own.
The science behind graphic user interfaces is finely tuned – the result of nearly three decades of design, development, and testing. Voice user interfaces, on the other hand, are incredibly new, reaching the mainstream only 7 years ago, in the iOS 5 release for the iPhone 4S. In less than a decade, voice has become an ecosystem unto itself, driven by explosive growth, not just with its ubiquitous presence in mobile devices, but with the tens of millions of home devices sold each year by Amazon, Google, and several others
For VUI developers who are tackling this new form of search and software engagement, the technology represents a confluence of science fiction and modern AI limitations – the culmination of more than fifty years of dreams and experiments. So, it’s not surprising that users and developers alike are still learning what a VUI looks like and how it should operate.
What questions should a user ask to get the answers they need? which commands should they give to trigger the actions they want? That’s the biggest and most pressing question currently facing the industry. As VUIs improve and become streamlined, what will users be asking and how will they be communicating in the years to come?
What Should the AI Understand?
The challenge of a voice user interface is that it relies on dialogue and the triggers, cues, and colloquialisms that are unique to almost every individual that will use your software. It has rules and must be trained to respond to certain patterns of speech, spoken in certain ways.
The more flexible you make your interface, the more training it will require and error-prone it will become. The less flexible it is, the more frustrating it will be for the user. So there needs to be a fine balance here of what the AI will understand and what the user is responsible for.
We chose to place dialogue at the core of our design, such that the system updates a context on the flow of questions and responses. This lets us focus on more simple questions. It’s a big difference with Search as we’re used to it since Google. Instead of placing all of the keywords in a single utterance, which is complicated with Voice and creates confusion for the NLP, we chose dialogs and follow-ups as a way to break down a complex question into smaller sub-questions that are easier to phrase and to understand.
Communicating Limitations and Expectations to the End User
In the traditional sense, the design is a heavily involved process that relies extensively on the expertise and experience of the designers, combined with targeted testing and experimentation with the users. With a VUI, this is slightly different. There are fewer best practices established in how VUI should work, and most importantly the system is self-learning. That turns the design process into something where algorithms play a larger role, for example, to determine how to take into account user feedback, new utterances, new synonyms, etc.
You will need to design your product to capture the right data to learn from your users as much as possible, meaning nearly limitless iteration across a range of different technological barriers – from the core AI’s understanding of the user to the ways in which you prompt user input and the responses you receive.
At the same time, the system needs to communicate to the user what is needed. If you’ve ever used an Echo, you know Alexa prompts detailed responses, often times redundantly for existing users. This is intentional and important as it tells the user exactly what is needed while reducing the frequency with which people might “get stuck” repeating the same question in different forms.
It is tempting to use dialogue trees because of their similarities to UX flow charts – attempting to match the natural flow of human speech when a question is asked. While designers can infer to some degree, much of this is actually useless in a natural language conversation. Predicting what a user will ask is a challenging process that requires equal measures of art and science. When done properly, you’ll build a carefully balanced system that can handle increasingly subtle user intents and move them in the right direction towards the information they seek.
The Next Step for VUI Algorithms
The next natural stage for voice interfaces is the kind of maturity we already see in several other technologies – the ability for devices to recognize and interact with users and take into account their “context”: location, upcoming meetings, recent messages, habits, etc. The challenge is not only technical, but it is also a matter of getting the trust of users that we’re not invading their privacy by looking at their data. This is possible with on-device processing, where algorithms run locally on the device and share no information with the service provider or device manufacturer.
This will not only make the systems easier to use wherever a user might be, but it allows the system to get smarter, leveraging machine learning technologies to start inferring greater amounts of information from users based on their mood, tone of voice, context and word selection. We are still some time away from this becoming a reality, but the investment and attention to detail in user interaction within these systems will help to get us that much closer.
There is a careful zen balance between AI learning from what a user asks and the user learning what to ask to get something out of a voice interface. The balance will continue to shift toward the AI as the systems get smarter and more ubiquitous, but for now, designers need to be cognizant of this issue and build applications to match.
Adrien Schmidt is an internationally recognized engineer, speaker, and entrepreneur. He is the CEO and Co-Founder of Aristotle by Bouquet.ai, an enterprise software company in San Francisco, CA, that delivers a personal voice analytics assistant to convert data analytics into meaningful conversation. As a thought leader in the AI/Voice space, his work can be found in major publications such as Forbes, Inc, HuffPo, and B2C. He is listed in Inc. as an AI Entrepreneur to Watch and has spoken at events such as Web Summit, Collision, Conversational Interaction, VOICE Summit, and P&G Data Analytics Summit. Connect with him on his company or personal website, Twitter, or LinkedIn.