This Amazon Alexa skill allows a user to ask Alexa to play back the "Letter of the Day” from Savage Love, a popular dating and relationship advice column. These letters are typically short, so my thought was that users could listen to the letter during interim moments between other activities, or when they have a few moments to spare before or after larger events in their day, perhaps as they’re getting their breakfast ready before work, or when they’re waiting for dinner to finish.
In all honesty, I wasn’t sure that the skill would be something anyone would actually want to use. So I developed a “minimum viable product,” crucial component of the Lean Startup Methodology, to test the app with actual users and learn as much as possible before spending too much time actually coding it.
I knew that Alexa sounded quite natural in short bursts—perhaps a few sentences at a time—but that the speech it produces isn’t perfect. Since this application would require Alexa to read back, on average, two to three minutes of synthesized speech per letter, would users become irritated? Would the colloquial language and acronyms often found in the letters and responses be read back incorrectly by the Alexa engine, confusing users to the point where they lose track of what’s being read to them? Rather than diving right in with development of the skill, I made a prototype that I could test on others and try to find answers to these questions.
When creating Voice User Interfaces (VUI), it is arguably even more important than in graphical user interfaces (GUI) to provide feedback to the user, telling them what actions they can take at what time, and assuring them that they’re in the right place. In contrast to graphical interfaces, in which standards and conventions that most users are familiar with have arisen over time, VUIs are still novel and unfamiliar to many people, so extra guidance is needed.
For example, a user might say “Open Savage Love,” but they might not specify their intent, because they aren’t sure what the possibilities are. In this case, they have to be informed of their options, but without being overwhelmed by them. No more than three possibilities should be presented to the user at a time, so if the user opens the skill without saying what it is they want to do, Alexa tells them they can “listen to today’s letter,” “listen to yesterday’s letter,” or “listen to a random letter.”
If the user is already familiar with the skill and does know what they want to do, they can open it and say their intent all at once. They might say, for example, “Open Savage Love and play the letter of the day for today.” The chart below shows the basic interaction a user can have with the skill:
Alexa skills require the following components:
getletteroftheday, which, upon request, gets a letter and plays it back to the user.
So, the Alexa engine will listen to the user's request, and break it down as follows:
Because there are so many different ways people can ask for the same thing, utterances need to be provided to the Alexa service by the developer to ensure that Alexa correctly interprets what the user is asking for. I tried to think of as many different ways that a user might ask for the letter of the day as I could. Of course, I wouldn't know for sure how people would make their requests until I tested, but I used these 70 as a starting point.
Creating prototypes for voice user interfaces (VUI) poses more of a challenge than creating them for graphical user interface (GUI). Whereas Sketch, Photoshop, InVision, Adobe XD, and others are all available to quickly create visual prototypes, similar VUI prototyping options are not yet available.
Therefore, some programming was necessary in order to conduct user tests, though I did try to simplify as much as possible. For instance, rather than writing code to actually find letters for today, yesterday, and a random day in the past, I simply hard-coded the skill to play back particular letters (so, for instance, if a user asked for a "random" letter to be read to them, the letter for January 14 would always play back, rather than a truly random letter). This dramatically cut down on development time, while still allowing me to sufficiently test the skill’s functionality and observe how users interact with it. The code is admittedly very rough, and it was a little difficult not to spend time refactoring and optimizing it, but my intent was to write something that was good enough to test as quickly as possible as an exercise in rapid prototyping. View the prototype code here.
The skill functions as follows:
I presented users with the following simple scenario:
“You want to listen to a Savage Love Letter of the Day. Ask Alexa for it.”
I had to provide some basic guidance on how to interact with Alexa for users, some of whom were totally unfamiliar with it. I informed them that in order to ask Alexa to do something, they have to use the wake word “Alexa,” and that the invocation name “Savage Love” had to be used to open and interact with the skill.
But I was especially careful not to give them specific words to use to access letters, since I wanted to try to collect as many natural utterances as possible. There are hundreds of slight variations in the way people can ask for the same thing, and unlike in GUIs where the visuals (ideally) provide affordances to the user, immediately showing them what the possibilities for interaction are, things are not so apparent in voice interfaces. So I provided minimal guidance to users during testing, so I could determine if the affordances programmed into the skill provided enough information to the user.
Results were as follows:
The consensus seemed to be that, with some improvements, regular readers of the column might find this skill useful. Only one of the test participants (out of four) had read Savage Love before, and she was a sporadic reader. After making improvements, I hoped to test it with regular readers.
Unfortunately, I hadn’t considered that I wouldn’t be allowed to publish this skill on Amazon for others to download and use, since I don’t have legal permission to distribute the column. Amazon is quite strict about ensuring that skills don’t make use of any unauthorized intellectual property. Luckily, since this was developed as a minimum viable product, I hadn’t spent too much developing it. As it turns out, this product is probably not viable, but due to unforeseen legal reasons, rather than it not being especially useful.
I plan to contact The Stranger, the publisher of Savage Love, to see about getting permission to publish the skill, but I’m not particularly optimistic that I’ll get it. However, developing and testing the skill was a very valuable learning experience, and the possibilities for voice-controlled applications are very exciting. In this specific case, requesting the letter of the day just by asking for it is far easier and more natural than looking it up on the website or newsreader on one’s computer or mobile device. It also allows the user to multitask, quickly accessing and consuming the letter while remaining free to do other things simultaneously.
Many industries, such as travel, see voice-based interaction as the "next frontier" in computing. The challenge, they say, will be dealing with the "unstructured queries" of voice-based computing. Indeed, even in this Alexa application, which was quite simple, users often phrased their requests in unexpected ways. Therefore, careful design of VUIs, based on extensive research and testing, will be crucial to allow users to accomplish their desired tasks while still maintaining the natural, conversational interactions that make voice-based computing so appealing.