Introducing Version 6: Tiger


Back to Overview

Voice Applications with Alexa and Google Home

Voice Applications with Alexa and Google Home

Purpose

With the right tools, integrating Voice into your applications is relatively straightforward.

Modules

  • Alexa/Google In to receive Alexa input
  • Alexa/Google Out to output to Alexa voice
  • Alexa/Google intent Switch to determine predetermined intents
  • Alexa/Google Action Cards to perform various actions
  • NLP/RASA for custom language processing needs
Interplay Alexa Module

Description

Voice is taking an increasing role in how people interact with data. We use voice to update our appointments, start exercise routines, change the song playing, buy groceries, and (if you’re talented) control the lights in the house. Basically, almost any interaction done on can have a voice element included somewhere.

But how to add voice to your app?

Smart Speakers such as Alexa or Google Home work by listening for trigger words “Alexa...” or “Hey Google…” then taking the words that follow, processing them into text, and then passing that text as keywords into their own respective search engines. Applications or websites can take advantage of this just as the same as typing a string of keywords would go directly to a product detail page or specific object in a list. In a normal browser, keywords take us to a page full of information, and we can then find the specific data point and then click to perform the subsequent action (add an item to our cart, look up a flight, etc.) The challenge with voice-initiated interaction is that Alexa or Google Home isn’t going to read us the entire contents of the landing page-- it will only respond with a short spoken sentence of the first thing it finds.

So, the challenge is to configure your application or data set to find the _specfic bit of information_ (e.g. when my flight leaves) or _distinct action_ (e.g. adding milk and orange juice to my curbside pickup) that can be done with a simple common-language voice command.

At a high-level, there are several steps here:

  1. Prepare your data set to have hooks or action points at a granular level
  2. Connect to the Alexa or google Home interface to get the text result of the spoken command
  3. Interpret that text result into a specific action, requiring AI to interpret words like “when is my ___”, ‘ add ___ to my pickup”, etc.
  4. Connect the appropriate AI-determined command into an action point (prepared in the first step)
  5. Follow through with the entire chain of actions, including connections to payment gateways, reservation systems, etc
  6. Prepare the appropriate completion message and send back to the smart speaker for audio confirmation

To build this out, several bits of technology would come into play: the voice SDK from Alexa or Google Home to get the spoken words as text, preparing your app or website to accept at ac at a granular level, the AI text processing to correctly parse and “understand” the text, connect to all appropriate APIs at the inventory, order, or content data level, perform the action, then send messaging back to the smart speaker.

Interplay already has modules for all of these steps. We’ve connected voice commands to ecommerce purchases, FAQs, curbside pickups, and more. We’re expanding these capabilities into appointment scheduling, services, and deeper database lookups. If you would like to add voice to your apps, let us know-- it could be faster and easier than you think.

Learn More