Engine Attributes

A text-to-speech engine has a set of attributes that affect various aspects and parameters of text-to-speech translation. Initially, the attributes are set to optimal values as defined by the engine. The application can maintain these values, or use the engine object's ITTSAttributes interface to query and set them. To get the address of ITTSAttributes, call the ITTSCentral::QueryInterface member function with the IID_ITTSAttributes interface identifier.

Whenever an engine attribute has changed, the engine notifies an application by calling the application's ITTSNotifySink::AttribChanged member function. The member function includes a flag that identifies the attribute that changed. In the current version of the speech API, the AttribChanged notification is not very useful. The notification will be more useful in future versions because multiple processes or applications will be able to share the same text-to-speech engine through a text-to-speech sharing object (similar to the speech-recognition sharing object); an application will need to know when another application changes an attribute of the shared engine.

Pitch

An application can set the baseline pitch (in hertz) of the speaking voice for a mode by using the ITTSAttributes::PitchSet member function. It can query for the current pitch by using the ITTSAttributes::PitchGet member function.

If the text being spoken contains text-to-speech control tags, the actual pitch of the speaking voice typically fluctuates above the baseline pitch, but usually does not go below it. The Pit control tag changes the baseline pitch setting for the mode, the same as when PitchSet is called.

Real-Time Setting

The real-time setting is the percentage of processor time that the engine expects to use during constant text-to-speech translation. For example, if the real-time setting is 100, the engine takes one full minute of processor time to process one minute of speech. If the real-time setting is 50, the engine takes 30 seconds of processor time to process the same minute of speech. This value is difficult to compute precisely, so it should be regarded as an estimate.

The real-time setting can be more than 100 for non-real-time applications (for example, applications that translate text from a file). For most engines, the amount of processor time required diminishes markedly during periods of silence.

If an application changes the real-time setting for an engine by calling the ITTSAttributes::RealTimeSet member function, the engine does its best to meet the new real-time expectation. However, a real-time setting of one is not possible for today's personal computers.

An application retrieves the current real-time setting for an engine by using the ITTSAttributes::RealTimeGet member function.

Average Speaking Speed

The text-to-speech engine can speak text at various speeds. An application can set the engine's average speaking speed by using the ITTSAttributes::SpeedSet member function and can retrieve the current average speed by using the ITTSAttributes::SpeedGet member function. The average talking speed is measured in words per minute.

If the engine plays text that contains text-to-speech control tags, the actual speed of the speaking voice fluctuates above and below the average. The Spd control tag changes the baseline speed setting for the mode, the same as when SpeedSet is called.

Speaking Volume

An application can use the ITTSAttributes::VolumeSet member function to set the baseline speaking volume for a text-to-speech mode. To set the volume, the application specifies a value in the range 0x0000 for absolute silence to 0xFFFF for the maximum volume for the mode.

The application specifies the volume for the left and right channels; the level for the left channel is in the low-order word, and the level for the right channel is in the high-order word. The text-to-speech engine or the audio object may not support independent control of the left and right channels. In this case, the engine typically uses the left channel value for the baseline volume.

If the text being played contains control tags, the actual volume of the speaking voice fluctuates above and below this baseline. The Vol control tag changes the baseline volume setting for the mode, the same as when VolumeSet is called.

An application can retrieve the current baseline speaking volume for a mode by using the ITTSAttributes::VolumeGet member function.