The synthesized voice provided by even the best text-to-speech engine is noticeably different from that provided by a digital-audio recording. Mixing the two in the same utterance can be disturbing to the user (and usually makes the text-to-speech voice sound worse by comparison).
For example, to have an application speak "The number is 56,738," you should not prerecord "The number is" and use text-to-speech to speak the numbers. You should either prerecord everything or use text-to-speech for everything.