Using Voice Dictation

An application wishing to write to the Voice Dictation API will need to incorporate the following code.

When the application starts up, it will need to call:

CoCreateInstance (..., CLSID_VDct, ..., &gpIVoiceDictation);

gpIVoiceDictation->QueryInterface (..., IID_IVDCTTEXT, &gpIVDctText);

gpIVoiceDictation->QueryInterface (..., IID_IVDCTGUI, &gpIVDctGUI);

If the application has not created its own topic, and does not wish to use a common topic, it needs to generate on. It calls:

gpIVoiceDictation->TopicAddString ("My App's Topic", NULL,

"E-mail", NULL, NULL, NULL);

The application can provide more information, such as sample words or documents for the topic. More information might increase the initial accuracy of the topic.

Once the application knows a topic, it calls Register() so that the Voice Dictation knows what topic to use:

gpIVoiceDictation->Register ("My Application", "My App's Topic",

NULL, NULL, &IVDctNotifySink, IID_IVDCTNOTIFYSINK, 0);

If an application were loading in a saved dictation session it would pass in an Istorage where the session information was saved.

The IVDctNotifySink is the notification sink code supplied by the application. It's member functions will be called to notify the application when the user speaks and text is added/removed from the virtual buffer. We will describe some of the interfaces as they're needed.

When the application wishes to listen, it calls:

gpIVoiceDictation->Activate (hWnd);

hWnd is the window handle of the window. The dictation engine might not really be listening until a global flag of EnabledSet() is turned on. The application usually does not have to turn this on since the system will have some UI that allows the user to turn it on. From the user's perspective he/she will then be turning on speech recognition for the entire PC, not just one application at a time.

Now that the application is registered, it creates its own edit box, rich edit box, or custom control that accepts text entry. This will be intimately tied into the virtual text edit box supplied by Voice Dictation. Whenever the user speaks some text that will be entered into the virtual edit box, the application will be called with, IVDctNotifySink.TextChanged() or TextSelChanged(). This means that something has changed and the application should synchronize it's visible edit box with Voice Dictation's invisible one so that the user will see the changes. To synchronize, the application calls:

gpIVDctText->Lock();

gpIVDctText->GetChanges (&dwOldStart, &dwOldEnd, &dwNewStart, &dwNewEnd);

gpIVDctText->TextGet (dwNewStart, dwNewEnd - dwNewStart, pszTemp);

Replace the text from dwOldStart to dwOldEnd in the visible edit box with pszTemp;

gpIVDctText->TextSelGet (&dwSelStart, &dwSelSize);

Set the text selection from dwSelStart to (dwSelStart + dwSelSize) in the visible

edit box;

gpIVDctText->UnLock();

Get the screen coordinates of the visible selection and put into rectSel;

gpIVDctGUI->SetSelRect (rectSel);

This code locks down the invisible text buffer so that it is not changed by inverse text normalization or new recognitions. It then finds out what text has changed since the last time the application synchronized and displays those changes to the user. The correction user-interface is then moved so that it's adjacent to the selected text.

Likewise, the application has to notify the Voice Dictation object when the user changes the selection or text.

If the user changes the selection with a mouse or keyboard then the application calls:

gpIVDctText->Lock();

gpIVDctText->GetChanges (&dwOldStart, &dwOldEnd, &dwNewStart, &dwNewEnd);

if the text has changed then the application should update the UI;

gpIVDctText->TextSelSet (dwSelStart, dwSelSize);

gpIVDctText->UnLock();

Get the screen coordinates of the visible selection and put into rectSel;

gpIVDctGUI->SetSelRect (rectSel);

If the user types in some new text, pastes in text, or does anything to change the text, the application should call:

gpIVDctText->Lock();

gpIVDctText->GetChanges (&dwOldStart, &dwOldEnd, &dwNewStart, &dwNewEnd);

if the text has changed then the application should update the UI;

gpIVDctText->TextSet (dwStart, dwNumCharsReplace, szNewText, reason);

gpIVDctText->TextSelSet (dwSelStart, dwSelSize);

gpIVDctText->UnLock();

Get the screen coordinates of the visible selection and put into rectSel;

gpIVDctGUI->SetSelRect (rectSel);

When the application closes down it just releases all of the interfaces.

That's all an application has to do for minimal functionality. Keeping the text buffers synchronized is tricky, but once that problem is solved, dictation is easy. The application doesn't have to worry about any more UI.

Some applications might want to take advantage of even more features such as:

Application Specific Commands - An application might want to provide its own commands that will be active during dictation. A good one is "line-up" or "line-down"; These cannot be built-in commands since the virtual edit box doesn't know where a line begins/ends.

Topic-Specific Glossary Entries - An application might want to add glossary entries to the topic so that the user won't have to add them him/herself.

Save all of the audio and speech recognition correction information into a file so the user can re-load it and do more correction.

If an application wants to maintain tags or marks within the text it can put in "bookmarks" between characters, and always be able to find their current position. This could be used to indicate changes in font.

Of course, there are many more options. Read the rest of the documentation for more information.