Other Problems

Some other problems crop up:

Having a user spell out words is a bad idea for most recognizers because they are too inaccurate.

An engine also cannot tell who is speaking, although some engines may be able to detect a change in the speaker. Voice-recognition algorithms exist that can be used to identify a speaker, but currently they cannot also determine what the speaker is saying.

In addition, an engine cannot detect multiple speakers talking over each other in the same digital-audio stream. This means that a dictation system used to transcribe a meeting will not perform accurately during times when two or more people are talking at once.

· Unlike a human being, an engine cannot hear a new word and guess its spelling.

· Localization of a speech-recognition engine is time-consuming and expensive, requiring extensive amounts of speech data and the skills of a trained linguist. If a language has strong dialects that each represent sizable markets, it is also necessary to localize the engine for each dialect. Consequently, most engines support only five or ten major languages—for example, European languages and Japanese, or possibly Korean.

· Speakers with accents or those speaking in nonstandard dialects can expect more misrecognitions until they train the engine to recognize their speech. Even then, the engine accuracy will not be as high as it would be for someone with the expected accent or dialect. An engine can be designed to recognize different accents or dialects, but this requires almost as much effort as porting the engine to a new language.