In our kitchen is a short white cylinder. Inside is an array of microphones, always listening passively. Whenever someone says the magic word it starts actively listening for instructions. This might be to start a timer, it might be to add something to our shopping list, or perhaps to turn a smart light or socket on or off. Within moments a feminine voice informs that the instructions have been carried out. This is really, really convenient.
You might be thinking: "So what? My phone does the exact same thing," but you'd only be about half right about that. Your phone has fewer, smaller microphones. It needs to run in a smaller power envelope. It's probably in your pocket and needs to be taken out, or at the very least doesn't always sit in a consistent, convenient position. Overall (based on my experience with Siri and Google Assistant1 on phones) it's nowhere near as reliable.
I am, of course, talking about Amazon Alexa, and her physical incarnation in the Amazon Echo. Specifically the Amazon Echo Dot, but all of this applies to the full size Echo and probably the Google Home as well.
There are deeper use cases here. My flat is a small duplex with the dinning room on a different floor to the kitchen. Being able to yell out "Alexa, turn on the upstairs lights," can be an absolute godsend when leaving the kitchen fully laden with plates and utensils. Likewise, being able to carry everything downstairs, and then turn the lights off without having to go back up is awesome. I can also, with my head still in the fridge, say "Alexa, add onions to the shopping list," and BOOM they appear in Todoist2. Likewise I can say "Alexa, order more of the Fudge hair wax," and after clarifying my intentions it'll do just that.
It's not all sunshine and roses, though. I paid £50 for the Echo Dot I have in the flat. I find myself wondering: how much of the CPU time it takes to run Alexa does that buy? How, exactly, is Amazon planning on profiting from this product? The obvious answer is the last usecase I listed above. Amazon is trying to make it easier for you to buy things from Amazon. That seems like a hell of a gamble, though. I'd actually be more comfortable if Amazon charged a small subscription fee here.
At the end of the day I've placed a microphone in my flat; one which is connected via the internet to one of the largest technology and retail companies in the world. Am I to believe they won't try to squeeze every last iota of possible revenue out of it? Am I going to talk about shoes in my kitchen one day, and find the internet littered with Amazon adverts for Sketchers the next?
The Google Home has already done a couple of things which some commentators have likened to placing advertising on the device. I don't want a device in my home which might trawl everything I say in order to advertise to me. I don't want to create a new vector for data to be gathered about me for advertising purposes. I definitely don't want to place a new advertising vector in my home.
Then there's the small matter of the Amazon Echo which might have witnessed a murder. Amazon is very publicly fighting the warrants and protecting user privacy here, but the fact that these warrants exist sets a precedence. Amazon could be compelled by a law enforcement agency to use an Echo as a surveillance device. It would be easy (and very lazy) to start making 1984 comparisons here. The crux of that novel is not really the pervasive surveillance, though. It's the fact that it's used to enforce "Thought Crime" laws3. That's not something we have, at least in the west4, thankfully.
In some ways my ideal solution to this would be to host the assistant on a server I control, inside my flat. I want the smart part of my smart home to live in my home, not out on the internet. I don't want my lights or heating to stop functioning if the internet goes down. Likewise, if my digital assistant lived on a server based in my flat I wouldn't have to worry about degradation of service there, either. I also wouldn't have to worry about any data it does collect being used to try and sell me shoes later.
This idea isn't perfect, of course. In fact it introduces new security issues, given that it would almost certainly need to use the internet for data gathering, communication and updates. That opens the possibility of it being hacked, and a server which lives in my attic is always going to be more vulnerable than one which lives in one of Amazon or Google's data centres. Another issues would be those updates I mentioned just then. Where exactly do they come from? Who is responsible for that?
Mark Zuckerberg rolled his own when he build his Jarvis smart home digital assistant. The post definitely suggests that Jarvis runs inside his house, but stops just short of confirming that. That said: I doubt Zuck's internet ever goes down, and his connection is probably so fast that running in a data centre and running in his house is probably very similar in practice.
Jarvis does a few things I really like. Chiefly, it knows the difference between Zuck and his wife. So different music plays depending who's asking for it. That's something Alexa needs. It would be nice if a timer set by me pinged my phone, and only my phone, when it completed. Jarvis also has enough context to know which room the person speaking is in, so saying "turn the lights on" will do so without needing you to be specific.
On the other hand, because he rolled his own, Zuck is theoretically responsible for 100% of the maintenance. In his case that's a problem he can throw money at (this is a man with employees). Alternatively he could open source it and be reasonably confidant that other developers would pay attention. Either way, he still has to get the updates onto his installation.
Another approach would be to use a device and service from a smaller company not affiliated with one of the large online advertising and commerce companies. The main one I'm aware of is Emotech and their award winning (but not yet shipped) product Olly. One of Olly's key selling points is that it adjusts its personality to each of its users, which is actually quite interesting5. Siri kind of has a personality, in that it sometimes gives you a bit of snark6, but in my experience Google Assistant and Alexa are almost entirely lacking in this department.
The downside of going with a smaller company for a service like this is the possibility that they might go out of business and turn the physical unit into a high tech and very expensive paper weight.
Lastly I want to talk about the utility of voice as a communication medium. Obviously it does have utility, give that it's essentially the default and primary human communication method. But, as Zuck notes in the post linked above, text is definitely catching up:
"…the volume of text messaging around the world is growing much faster than the volume of voice communication."
Voice communication with a computer is really awkward at times. Don't believe me? Try asking Siri about the weather while standing in your office. If you don't work in a call centre you're going to start getting annoyed looks very quickly.
Voice recognition also adds an additional moving part. Figuring out what a statement actually means is hard enough when you know what the actual words are. Not knowing whether the user said "there", "their" or "they're" adds an additional level of ambiguity. Add in background noise and maybe they actually said "tear". Or "tare".
Zuck's solution is to also make a chat bot (via Facebook Messenger, of course) as an additional communication vector with Jarvis. Google Assistant is also available via the Allo messenger, but as far as I can tell Siri and Alexa7 don't really have these options.
In closing I'd like to suggest a third way. In addition to speech and text, the other way humans can express natural language is to sign it with their hands. It's more silent (and therefore less disruptive) than voice. It's more expressive than text. It's less ambiguous and faster to communicate than either. It's not a perfect solution. Off the top of my head I can think of three serious issues:
Problem the first: Obviously, in order for this to work the user must be able to speak sign language. That said: they might only need a small vocabulary to cover most use cases, and can fall to back speech or text when needed (or when their hands are full). I'm also of the opinion that sign language is something which should be taught in school.
Problem the second: There's no such thing as "sign language" per se. There is British Sign Language. There is American Sign Language. I was once told that every region of China has its own. Your automated system will need to be able to understand all of them eventually, or introduce one of its own.
Problem the third: Automated sign language understanding might be an even harder problem than automated speech understanding. You need to know the position of the user's hands to a fairly high degree of accuracy. That means one of the following:
- Cameras in the world, which doesn't scale and has privacy issues;
- Cameras on the user, which still has privacy issues;
- Special gloves, or something else which allows direct sensing of the users hands.
Even so. I really think this is worth pursuing, if only as an additional option. A room full of people silently signing to their AI assistants feels like it scales better than talking to them.
This is actually the play I thought Amazon was making when they introduced the Echo Look. I was wrong about that, but still I had some hope for a second there.
…or Google Voice Search, or S Voice, or any of the other manufacturer specific voice assistants. ↩
Todoist and Any.do have direct integrations with Amazon Alexa. Other task managers can be connected via IFTTT, but the integration isn't as clean. Really, I would like integration with an actual shopping list app such as Paprika or AnyList. That would be perfect. ↩
When I worry about this, I worry that if we every do have laws about "Thought Crimes" they'll be demanded by the left, not the right. By which I mean (to be clear): those who identify as progressive, not those who identify as conservative. ↩
That said, some of the actions of the Trump and May governments could be construed as leaning this way. ↩
It kinda puts me in mind of the deamons from Phillip Pullman's His Dark Materials series of books. ↩
"Three minutes and counting. Don't overcook that egg!" ↩
Alexa sort of does, but for some bizarre reason it's in the Amazon shopping app rather than the Alexa app, and appears to have fairly limited functionality. ↩