An application can perform speech-recognition by using three different kinds of grammars: context-free, dictation, and limited domain. Each grammar uses a different strategy for narrowing the set of sentences it will recognize:
· Context-free grammar. Uses rules that predict the next words that might possibly follow the word just spoken, reducing the number of candidates to evaluate in order to recognize the next word.
· Dictation grammar. Defines a context for the speaker by identifying the subject of the dictation, the expected style of language, and what dictation has already been done.
· Limited-domain grammar. Does not provide strict syntax structures, but does provide a set of words to recognize. Limited-domain grammar is a hybrid between a context-free grammar and a full dictation grammar.
The approach used by a speech-recognition engine to accept verbal input influences the approach used by the grammar to recognize a phrase. A speech-recognition engine accepts verbal input using one of the following technologies:
· Discrete speech. Each word must be isolated by pauses before and after the word.
· Word spotting. Only certain words are recognized in an utterance.
· Continuous speech. Given a continuous utterance with no pauses between words, the engine can recognize the words that were spoken.
Each of these technologies can influence how the grammar recognizes a phrase in the following ways:
Technology | Context-free grammar | Dictation or limited-domain grammar |
Discrete speech | A phrase is completed when one path of the grammar is traversed by the isolated words. A path is abandoned when an incorrect word is spoken or the delay between words is too long. | A phrase is sent to the application when the engine isolates the word. |
Word spotting | The grammar paths specify what words are to be spotted in what sequence. For example, if one of the grammar paths is "mail" followed by "Fred," then "Send all of my MAIL to FRED Smith" would complete the path. | Keywords are spotted in the utterance and a phrase is sent when the engine determines that the keyword was spoken. |
Continuous speech | Words can be spoken in a fluid manner and are allowed to blend into one another, although slight pauses are allowed. A path is abandoned when an incorrect word is spoken or the delay between words is too long. | The engine parses the continuous stream, and when it determines a sequence of words has been spoken, it collects them into a phrase and sends them to the application. |