>>... understand that some of the really good speech synthesizers
>> "cheat" by concatenating human recordings of words, or otherwise
>> patching the synthesis on a word-by-word basis.
>>
>> I know that there's at least one TTS product using 60 MB.
>
> That doesn't sound like much. Festival takes up about 70MB on my system.
How much memory does it allocate when that runs?
How many different audio files does it have?
I don't think I've looked very closely at a speech synthesis
program since around 2000. When I found that very few people knew
how to synthesize sound directly from a spectrogram, I was
astonished. That had been proven reversible. But the algorithms
published on it were doing a terrible job. At one point I wasn't
even sure that anyone knew how to do it in less than something
like either O(N^2 log N) or O(N log N^2) time.
The O(N log N) phase vocodec which resulted is in:
http://www.bovik.org/fs.m.txt
http://www.bovik.org/af.m.txt
That should help any resynthesis system use a compressed audio
format and still be able to do real-time resynthesis with any
arbitrary spectral transforms. It can also be used to normalize
data sets so they all have the same relative pitch, tempo, etc.
Thanks are due to Malcolm Slaney, Miller Puckette, Mark Dolson,
Vaughan Pratt, and 1977 IEEE Medal of Honor recipient Michael Portnoff.
I wonder how much switching to a Portnoff window from a Hamming
window helps ordinary speech recognition, all other things equal.
CMU Drs. Mostow and Aist have a patent on the use of speech
synthesis in my line of work. I'm glad they took that one out.
Sincerely,
James
--
www.readsay.com - maker of the ReadSay PROnounce English literacy system
400 MHz PDA included: $499 --
http://www.readsay.com/PROnounce.html