Using the Android Speech Recognition APIs

In my most recent project, I put together a voice controlled iRobot Create using the Android ADK, an iRobot Create and my Nexus S.  The Android speech recognition API takes care of listening for speech, determining when to end the speech input, and also sending the resulting recording off to “the cloud” for processing.  In the end, what you get back is a list of possible matches (this isn’t an exact science, after all).

There are two ways to incorporate speech recognition into an application.  In the first approach, an ACTION_RECOGNIZE_SPEECH intent is broadcast by your application using startActivityForResult.  The results are obtained by defining an onActivityResult method in your class.  As you can see in the Voice Recognition API demo, it is very simple to write an application using this interface!  The problem I had with this approach is that there was too little control over the speech recognition error handling.  Also, I really wanted the speech recognition to be running all of the time.  So, in the end I decided to use the second approach, using the SpeechRecognizer directly in my code.  This actually didn’t make the code all that much more complicated.  As an added bonus, your application is not being paused and resumed in order to get the results from the speech recognition activity.

Having the mechanics out-of the way, the next thing I did was to create a list of voice commands.  The list of speech recognition matches was compared against the command list.  If there was a match, I added the entire list of matches to a hash table, storing the actual command as the value.  Thus, any time a close match came up, it would be found in the hash table, with the entry being the (hopefully) intended command.

Now we have the name of a voice command.  We could write another if/else statement to perform the appropriate function call for each of the commands, or we could do something a little fancier.  Using reflection, I turned the command name into a method call.  So, to implement the command “forward,” you simply have to add a method called forward to the class!

Now, it isn’t quite that slick.  I still keep an if/else statement in order to get a match on the speech recognition results, and to store close matches in the hash table.  I’ll have to experiment with removing that code to see how it fares.

To browse through the code yourself, check out http://code.google.com/p/adk-moto/source/browse/src/com/jmoyer/adk_moto/ADKMoto.java.  It’s GPLv3 licensed, so cut-n-paste away into your open source projects!

Advertisements

5 thoughts on “Using the Android Speech Recognition APIs

  1. Thanks for this example! I used it in my app and it works great.

    Do you know if this Speech recognition method works on phones that are using Froyo instead of Gingerbread?

    Mark

  2. Thank you for the explanation 🙂
    I am actually using the same approach in my application. I wonder if there is a way to customize the grammar: looking for the matches in a local list of expected sentences/words.

    • Unfortunately, I don’t believe you have the ability to customize the search grammar beyond the two options given: LANGUAGE_MODEL_FREE_FORM and LANGUAGE_MODEL_WEB_SEARCH.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s