Porting an Existing Speech-Recognition Engine

Typically, an engine developer ports the code for an existing speech-recognition engine from 16-bit Windows, UNIX®, Macintosh®, or OS/2®, rather than developing a completely new engine. Porting a speech-recognition engine to the speech API usually involves the following steps:

1. Port the engine code to Win32 by using the API that is already in the code. This should be straightforward, but be aware that this may uncover old bugs because Win32 is more stringent about memory usage.

2. Move the engine code into a dynamic-link library (DLL).

3. Implement the major interfaces for speech recognition first to get the engine running with the speech API as quickly as possible.

4. Redesign the engine's audio input and output architecture to use streams. Using streams makes the engine consistent with the multimedia audio source and multimedia audio-destination objects, which also use streams. These objects are similar to multimedia wave-in and wave-out drivers.

5. Design the engine's interface architecture to support the interfaces defined for the speech-recognition engine object.

6. Implement the speech-recognition interfaces and functions for the engine.

7. Write additional code to enable the engine to support multiple instances. The time required to do this varies depending on the engine's internal architecture.

The engine must provide the engine enumerator, engine object, and grammar object, and may optionally provide a speech-recognition results object. The following table lists the required and optional interfaces for each object.

The engine should support both ANSI and Unicode implementations of each interface.

Object

Interface

Engine enumerator

ISREnum
ISRFind (optional)

Engine

ILexPronounce (optional)
ISRAttributes
ISRCentral
ISRDialogs
ISRSpeaker (optional)

Grammar

ISRGramCFG (context-free grammar)
ISRGramCommon
ISRGramDictation (dictation grammar)

Speech recognition results (optional)

ISRResAudio (optional)
ISRResBasic (optional)
ISRResCorrection (optional)
ISRResEval (optional)
ISRResGraph (optional)
ISRResMemory (optional)
ISRResMerge (optional)
ISRResSpeaker (optional)


The engine should also support the following notification sinks to pass information back to the application.

· ISRNotifySink. Used by an engine object to notify the application of interference, noise, and other factors that affect speech recognition.

· ISRGramNotifySink. Used by a grammar object to notify the application about a particular recognition.