Porting an Existing Text-to-Speech Engine

Typically, an engine developer ports the code for an existing text-to-speech engine from 16-bit Windows, UNIX, Macintosh, or OS/2, rather than develop a completely new engine. Porting a text-to-speech engine to the text-to-speech API usually involves these steps:

1. Port the engine code to Win32 by using the API that is already in the code. This should be straightforward, but be aware that this may uncover old bugs because Win32 is more stringent about memory.

2. Move the engine code into a DLL.

3. Implement the major interfaces for text-to-speech translation first to get the engine running with the speech API as quickly as possible.

4. Redesign the engine's audio input and output architecture to use streams. Using streams makes the engine consistent with the multimedia audio source and multimedia audio-destination objects, which also use streams. These objects are similar to multimedia wave-in and wave-out drivers.

5. Design the engine's interface architecture to support the interfaces defined for the text-to-speech engine object.

6. Implement the text-to-speech interfaces and functions for the engine.

7. Write additional code to enable the engine to support multiple instances. The time required to do this varies depending upon the engine's internal architecture.

The engine must provide the engine enumerator and the engine object. The following table lists the required and optional interfaces for each object.

The engine should support both ANSI and Unicode implementations of each interface.

Class

Interface

Engine enumerator

ITTSEnum
ITTSFind
(optional)

Engine

ILexPronounce (optional)
ITTSAttributes
ITTSCentral
ITTSDialogs


The engine should also support the following notification sinks to pass information back to the application.

· ITTSBufNotifySink. Notify the application of changes to the buffer that contains the text being spoken.

· ITTSNotifySink. Notify the application of the phoneme being spoken or that audio has started or stopped.