Usability Studies

Traditional GUI applications are intended to be used often and for long periods of time by their users. Any confusing points in the user interface will eventually be overcome through repeated use, and trail and error. The user might even have a manual or help file around to help him/her.

Often, telephony applications are used only occasionally by users, and only for short periods of time. It is likely that large portions of the calls will be from first time users. After all, how many people check the movie listings more than once a week or for more than five minutes at a time?

Because users on average will not have much experience with the application user interface must be as simple and self explanatory as possible. To insure this, the application must be put through a lot of usability studies.

The usability work in a telephony application will typically precede as follows:

Application designers figure out the tasks that a user wishes to accomplish with the application.

A prototype is coded up and a small group of users is given hypothetical tasks to see if they can accomplish them.

The application designers use the feedback from the prototype to modify the application design.

The real application is coded with logging ability. The logging ability keeps track of statistics (listed below) to figure out how successful a call was.

If an existing service is being replaced the a new speech-enabled one, a small percentage of calls are diverted from the existing service into the speech-enabled application. Detailed logging information is kept.

Application designers review the statistics and implement changes to improve performance.

Repeat steps five and six, gradually expanding the number of calls, until the speech-enabled application is handling all of the phone calls.

During the initial stages of an application extensive statistics on performance should be kept. The statistics logging might be disabled when the application goes into full usage, or might be continued so that continual improvement can be made. Some statistics and data to keep are:

How many and which users are actually successful at completing their tasks? In many applications it's obvious when a user has completed his task. In a movie listing application the caller will hang up after he has heard the movie time if it is successful, or hang up before the movie time is played if it is not successful. Some applications are more difficult to test and it may be necessary to poll a small sample of users their satisfaction level.

How long does it take successful users to complete a task? The less time it takes a user to complete a task the happier they'll be, and fewer telephony servers will be needed.

At each state or prompt, how many and which users speak a valid response? If a prompt verifies the data, how often is it wrong? How often does the speech recognizer produce "unrecognized" for a prompt? How long does it take a user to get through a state/prompt? What are the most common user responses? Prompts that take too long to maneuver through, produce a lot of unrecognized results, or a lot of misrecognitions need to be reworked. Features that aren't used can be hidden or removed. The most common responses should be top in the list of responses spoken to the user.

In the early stages of usability tests, applications should keep recordings of everything the user says to the speech recognizer. Application designers can listen to the responses and use them to adjust the wording of prompts or the responses which are accepted by the recognizer. If an application developer is striving for accuracy at any cost and is willing to pay the money to have custom speech recognition models created, the speech recognition vendor can use the audio from real-life users to improve accuracy.