Dependence on the Speaker

Speech-recognition engines may require training to recognize speech well for a particular speaker, or they may be able to adapt to a greater or lesser extent. Engines can be grouped into these categories:

· Speaker-dependent. The engine requires the user to train it to recognize his or her voice. Training usually involves speaking a series of pre-selected phrases. Each new speaker must perform the same training.

Speaker-dependent engines work without training, but their accuracy usually starts below 95 percent and does not improve until the user completes the training. This technique takes the least amount of processing, but it is frustrating for most users because the training is tedious, taking anywhere from five minutes to several hours.

· Speaker-adaptive. The engine trains itself to recognize the user's voice while the user performs ordinary tasks. Accuracy usually starts at about 90 percent, but rises to more acceptable levels after a few hours of use.

Two considerations must be taken into account with speaker-adaptive technology. First, the user must somehow inform the engine when it makes a mistake so that it does not learn based on the mistake. Second, even though recognition improves for the individual user, other people who try to use the system will get higher error rates until they have used the system for a while.

· Speaker-independent. The engine starts with an accuracy above 95 percent for most users (those who speak without accents). Almost all speaker-independent engines have training or adaptive abilities that improve their accuracy by a few more percentage points, but they do not require the use of such training. Speaker-independent systems require several times the computational power of speaker-dependent systems.