You are on page 1of 1

Programing voice interface

When adding voice capability to inanimate objects, it is important to consider how


many devices within a small proximity will have voice enablement, how many of
them share the same wake words, and so on. Think of it this way: imagine being in a
room with a dozen voice-enabled devices, then you ask for the time and you get a
response from ten different voices. How annoying would that be on a daily basis? In
that scenario, ideally there would be one main voice interface that interacts with
many different devices

.
In thinking about today’s consumer voice landscape, the everyday voice experiences
include Siri, OK Google, Cortana, and Alexa. Luckily, these all have their respective
wake words; therefore, having all these in the same room at the same time shouldn’t
be an issue unless you say something like “Hey Siri, search OK Google for Alexa.” As
close to the edge as edge cases can go, this one is pretty damn close but becoming
increasingly annoying as more voice interfaces get introduced into the market and,
therefore, more potential of clashing instead of communicating between devices is as
well. Amazon and Microsoft are trying to thwart this calamity with a newly minted
deal in which they will incorporate each of their AI personas on the other’s platform.
Additionally, users will sometimes mix up the wake words for their various devices—
for example, imagine someone pressing the button on their iPhone and saying “Hey
Alexa.”

One of the biggest issues with voice is discoverability. Alexa has over 15,000
skills and it’s very difficult to find out what is available. Even when you do find a rele‐
vant skill, it can be difficult to remember the commands. This is why most people
tend to go with a single voice interface versus having to deal with remembering mul‐
tiple wake words and commands

You might also like