Text-to-Speech Control Tags

A text-to-speech engine can usually translate individual words to speech successfully. However, as soon as the engine speaks a sentence, the perceived quality of its translation decreases because the engine cannot correctly synthesize human prosody — the inflection, accent, and timing of human speech.

The prosody of translated speech can be improved by using text-to-speech control tags to better simulate human speech. Control tags can be embedded in the source text passed to an engine with the ITTSCentral::TextData member function, or they can be inserted into the text that is currently playing by calling the ITTSCentral::Inject member function. This allows an application to alter prosody of text as it is spoken, without having to reconstruct the text-to-speech buffers.

This appendix describes control tags that you can use to alter the prosody of text translated into speech. All tags are optional except the bookmark (Mrk) tag, which must be supported.