As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?"?
I'm interested in speech signals as input to speech recognition software.
I get the impression that minimum acceptable sample rates begin at 8 kHz
( or above ). I assume this is based on which formants are considered
"significant". I have somewhat arbitrally chosen 44.1 kHz. The data I
have available is a studio quality CD.
From another thread, I assume that some characteristic time of a
phoneme is somewhere between .01 and .1 seconds (+- xx %).
Assuming whatever analysis I do is based on samples of width mm seconds
taken every nn seconds ( nn presumed < mm ) what are appropriate values
from a DSP point of view.
[ For perspective see my previous thread titled 'Low freq "analog" of
Nyquist? ( possibly naive question )' . I'm hoping I've learned enough
to better phrase my question ]
My ultimate goal is to reduce dependence of speech recognition's
accuracy on "good mikes" and "good acoustic environment'. Primarily the
later.
[ for those of you old enough, "this ram keeps butting the dam" ]
Re: As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?"?
Richard Owlett wrote:
> I'm interested in speech signals as input to speech recognition software.
>
> I get the impression that minimum acceptable sample rates begin at 8 kHz
> ( or above ). I assume this is based on which formants are considered
> "significant". I have somewhat arbitrally chosen 44.1 kHz. The data I
> have available is a studio quality CD.
>
> From another thread, I assume that some characteristic time of a
> phoneme is somewhere between .01 and .1 seconds (+- xx %).
>
> Assuming whatever analysis I do is based on samples of width mm seconds
> taken every nn seconds ( nn presumed < mm ) what are appropriate values
> from a DSP point of view.
>
> [ For perspective see my previous thread titled 'Low freq "analog" of
> Nyquist? ( possibly naive question )' . I'm hoping I've learned enough
> to better phrase my question ]
>
> My ultimate goal is to reduce dependence of speech recognition's
> accuracy on "good mikes" and "good acoustic environment'. Primarily the
> later.
>
> [ for those of you old enough, "this ram keeps butting the dam" ]
As "Nyquist" is to "sample rate", "frequency resolution" is to "sample
set duration".
Re: As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?" ?
On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
<[email protected]> wrote:
>I'm interested in speech signals as input to speech recognition software.
>
>I get the impression that minimum acceptable sample rates begin at 8 kHz
>( or above ). I assume this is based on which formants are considered
>"significant". I have somewhat arbitrally chosen 44.1 kHz. The data I
>have available is a studio quality CD.
>
> From another thread, I assume that some characteristic time of a
>phoneme is somewhere between .01 and .1 seconds (+- xx %).
>
>Assuming whatever analysis I do is based on samples of width mm seconds
>taken every nn seconds ( nn presumed < mm ) what are appropriate values
>from a DSP point of view.
>
>[ For perspective see my previous thread titled 'Low freq "analog" of
>Nyquist? ( possibly naive question )' . I'm hoping I've learned enough
>to better phrase my question ]
>
>My ultimate goal is to reduce dependence of speech recognition's
>accuracy on "good mikes" and "good acoustic environment'. Primarily the
>later.
>
>[ for those of you old enough, "this ram keeps butting the dam" ]
Hi,
I'm responding to the Subject text;
that is I'm responding to the words:
"Nyquist" is to "sample rate".
Please know that I have no clue whatsoever as
to the meaning of that single word "Nyquist".
However, I do have a rough notion of the
meaning of the two words "sample rate".
I'm not in the audio business, but here's what
I've heard. In telephones, the microphone signal
is filtered so its frequency bandwidth is just
less than 4 kHz. Then that analog signal is
digitized at a sample rate of 8 kHz, which
satisfies the "Nyquist Criteron".
Prepare for rant: I don't think people should
use the phrase "Nyquist frequency". That
phrase means different things to different people,
and this leads to confusion. I think we
should use the phrase "sample rate" when we mean
the "sample rate" and we should use the phrase "half
the sample rate" when we mean "half the sample rate"
Simple!!
Back to sampling human speech: As it turns out,
for good fidelity a human voice signal should
have a wider bandwidth than 4 kHz. But to reduce the
cost of telephone systems (so they can process as many
simultaneous speech signals as possible) early
telephone designers realized that you could limit a
human speech signal to a bandwidth as low as
(roughly) 4 kHz and people (their brains) could
still understand the speech signal.
Audio fanatics know that human hearing goes up to
(roughly) 18-20 kHz, so they want their systems
to cover that full frequency range in their
"high-fidelity" audio systems. Well, if you have
an analog signal whose bandwidth is 20 kHz, then your
A/D sample rate must be greater than twice that
frequency (Nyquist Criterion, again) which leads
to the "studio quality" sample rate of 44.1 kHz.
Sorry I can't be of more help. I wouldn't know
a "formant", or a "phoneme", if I found one dead
in my lunchbox.
Re: As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?" ?
"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
news:[email protected]..
> On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
> <[email protected]> wrote:
>
> Back to sampling human speech: As it turns out,
> for good fidelity a human voice signal should
> have a wider bandwidth than 4 kHz. But to reduce the
> cost of telephone systems (so they can process as many
> simultaneous speech signals as possible) early
> telephone designers realized that you could limit a
> human speech signal to a bandwidth as low as
> (roughly) 4 kHz and people (their brains) could
> still understand the speech signal.
Right. On the phone, it is generally quite easy to understand normal
conversation speech even with the limited frequency response. However, if
someone tries to read a string of random letters, it is quite a bit more
difficult to understand them on the other end. Losing those high frequencies
makes consonants difficult to differentiate. The brain normally does a good job
of compensating for the loss of high frequencies by using context clues. But
since very few context clues exist with a string of random letters, it becomes
difficult to understand.
So saying that a 4 kHz bandwidth is adequate for speech is a bit misleading.
Consonant sounds have some frequency content up to close to 20kHz, though there
is limited benefit to increasing to anything more than 10kHz IMO.
Re: As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?" ?
"Richard Owlett" schrieb
> As "Nyquist" is to "sample rate"
> "????" is to "sample period/duration/width/?" ?
As "sample rate" is "1 / (sample period)"
"sample period" is to "1/Nyquist"
This may be answer to your question, but not of much help.
I think you are mixing up two domains here: the one of
strict mathematics and signal processing and the other one
- much fuzzier - about the human perception of hearing and
the generation of speech. While human hearing is obviously
based on the same mathematics and physics of acoustics, there
are many tricks that evolution has come up with.
You might want to check the "Scientist's and Engineer's Guide
to Digital Signal Processing": http://www.analog.com/processors/res...brary/manuals/
training/materials/pdf/dsp_book_frontmat.pdf
especially chapter 22, "Audio Processing".
>"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
>news:[email protected]. .
>> On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
>> <[email protected]> wrote:
>>
>> Back to sampling human speech: As it turns out,
>> for good fidelity a human voice signal should
>> have a wider bandwidth than 4 kHz. But to reduce the
>> cost of telephone systems (so they can process as many
>> simultaneous speech signals as possible) early
>> telephone designers realized that you could limit a
>> human speech signal to a bandwidth as low as
>> (roughly) 4 kHz and people (their brains) could
>> still understand the speech signal.
>
>Right. On the phone, it is generally quite easy to understand normal
>conversation speech even with the limited frequency response. However, if
>someone tries to read a string of random letters, it is quite a bit more
>difficult to understand them on the other end. Losing those high frequencies
>makes consonants difficult to differentiate. The brain normally does a good job
>of compensating for the loss of high frequencies by using context clues. But
>since very few context clues exist with a string of random letters, it becomes
>difficult to understand.
>
>So saying that a 4 kHz bandwidth is adequate for speech is a bit misleading.
>Consonant sounds have some frequency content up to close to 20kHz, though there
>is limited benefit to increasing to anything more than 10kHz IMO.
Yes yes. You're right! I hadn't thought
about the consonants.
That's why, over the phone to say "FFT",
we'd say "foxtrot" "foxtrot" "tango".
Rick Lyons wrote:
> That's why, over the phone to say "FFT",
> we'd say "foxtrot" "foxtrot" "tango".
>
When I was at Raytheon, we had an operator/receptionist who made up her own
phonetic alphabet. She used it to announce license plate numbers when a driver
forgot to turn off the headlights.
She generally made up her phonetic alphabet on the spot as needed. My favorite
was "F as in Fun. L as in Love. And N as in... NEVER!" She sounded a lot like
Aretha Franklin in the Blues Brothers.
One day she paged a license plate by saying "Y as in You." That threw everyone
for a loop, because we all heard it as "Y as in U."
She inspired my coworkers and I to formualate a phonetic alphabet whose purpose
was to obfuscate rather than clarify. We favored the names of letters,
homophones that start with different letters (gnu, knew, new), names that didn't
add information (T as in tea), or words that sound like they start with a
different letter than they really do.
A as in aye
B as in bdellium
C as in cue
D as in Djibouti
E as in eye
F as in Fun (a nod to our operator)
G as in gnu
H as in hour
I as in inn
J as in jalapeno
K as in knew
L as in llama
M as in Mneumonic
N as in new
O as in ofal
P as in pea
Q as in Quay
R as in ... never found a good one for R
S as in sea
T as in tea
U as in ... oops. forgot that one
V as in vee
W as in why
Y as in you
Z as in zee (or zed)
--
Jim Thomas Principal Applications Engineer Bittware, Inc [email protected]http://www.bittware.com (603) 226-0404 x536
Nothing is ever so bad that it can't get worse. - Calvin
Cute. I've done similar things, and I like that you overloaded the
"new" and "eye" sounds, which completely defeats the purpose of a
phonetic alphabet.
Overloading similar sounds works, too, like B = boy and T = toy. A
low SNR connection creates ambiguities. So I used to work on rhyming
phonetic alphabets that were similarly useless.
I think you cheated on V and Z, though.
On Tue, 21 Sep 2004 09:32:36 -0400, Jim Thomas <[email protected]>
wrote:
>Rick Lyons wrote:
>> That's why, over the phone to say "FFT",
>> we'd say "foxtrot" "foxtrot" "tango".
>>
>
>When I was at Raytheon, we had an operator/receptionist who made up her own
>phonetic alphabet. She used it to announce license plate numbers when a driver
>forgot to turn off the headlights.
>
>She generally made up her phonetic alphabet on the spot as needed. My favorite
>was "F as in Fun. L as in Love. And N as in... NEVER!" She sounded a lot like
>Aretha Franklin in the Blues Brothers.
>
>One day she paged a license plate by saying "Y as in You." That threw everyone
>for a loop, because we all heard it as "Y as in U."
>
>She inspired my coworkers and I to formualate a phonetic alphabet whose purpose
>was to obfuscate rather than clarify. We favored the names of letters,
>homophones that start with different letters (gnu, knew, new), names that didn't
>add information (T as in tea), or words that sound like they start with a
>different letter than they really do.
>
>A as in aye
>B as in bdellium
>C as in cue
>D as in Djibouti
>E as in eye
>F as in Fun (a nod to our operator)
>G as in gnu
>H as in hour
>I as in inn
>J as in jalapeno
>K as in knew
>L as in llama
>M as in Mneumonic
>N as in new
>O as in ofal
>P as in pea
>Q as in Quay
>R as in ... never found a good one for R
>S as in sea
>T as in tea
>U as in ... oops. forgot that one
>V as in vee
>W as in why
>Y as in you
>Z as in zee (or zed)
>
>--
>Jim Thomas Principal Applications Engineer Bittware, Inc
>[email protected]http://www.bittware.com (603) 226-0404 x536
>Nothing is ever so bad that it can't get worse. - Calvin
Eric Jacobsen
Minister of Algorithms, Intel Corp.
My opinions may not be Intel's opinions. http://www.ericjacobsen.org
One time, I overhead someone spelling something over the phone saying "C as at
cat, M as in mat, and B as in bat". I got a good chuckle out of that, as did
they when I explained how the phonetics chosen didn't really help much! :-)
"Eric Jacobsen" <[email protected]> wrote in message
news:[email protected]..
> Cute. I've done similar things, and I like that you overloaded the
> "new" and "eye" sounds, which completely defeats the purpose of a
> phonetic alphabet.
>
> Overloading similar sounds works, too, like B = boy and T = toy. A
> low SNR connection creates ambiguities. So I used to work on rhyming
> phonetic alphabets that were similarly useless.
Re: As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?" ?
"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
news:[email protected]..
> On Mon, 20 Sep 2004 10:30:42 -0700, "Jon Harris"
> <[email protected]> wrote:
>
> >"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
> >news:[email protected]. .
> >> On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
> >> <[email protected]> wrote:
> >>
> >> Back to sampling human speech: As it turns out,
> >> for good fidelity a human voice signal should
> >> have a wider bandwidth than 4 kHz. But to reduce the
> >> cost of telephone systems (so they can process as many
> >> simultaneous speech signals as possible) early
> >> telephone designers realized that you could limit a
> >> human speech signal to a bandwidth as low as
> >> (roughly) 4 kHz and people (their brains) could
> >> still understand the speech signal.
> >
> >Right. On the phone, it is generally quite easy to understand normal
> >conversation speech even with the limited frequency response. However, if
> >someone tries to read a string of random letters, it is quite a bit more
> >difficult to understand them on the other end. Losing those high frequencies
> >makes consonants difficult to differentiate. The brain normally does a good
job
> >of compensating for the loss of high frequencies by using context clues. But
> >since very few context clues exist with a string of random letters, it
becomes
> >difficult to understand.
> >
> >So saying that a 4 kHz bandwidth is adequate for speech is a bit misleading.
> >Consonant sounds have some frequency content up to close to 20kHz, though
there
> >is limited benefit to increasing to anything more than 10kHz IMO.
>
> Yes yes. You're right! I hadn't thought
> about the consonants.
>
> That's why, over the phone to say "FFT",
> we'd say "foxtrot" "foxtrot" "tango".
Exactly! The military phonetic alphabet is designed to minimize ambiguity with
a poor quality communication link (unlike the fun ones we've been posting here).
Eric Jacobsen wrote:
> I think you cheated on V and Z, though.
>
Yup. Suggestions are welcome.
V as in vee
Z as in zee (or zed)
Any help with R and U would also be appreciated, as this is an incomplete work.
BTW, when one of the old Navy guys in our group sent the military phonetic
alpahbet to the operator, she flew off the handle. He was blackballed, and
could no longer be paged. I wish *I* had done that instead of him, because
paging was WAY overused. Most of the time when I'd answer a page, it would turn
out to be for "Kim Thomas" rather than "Jim Thomas."
I guess over-paging eventually got to the execs too, because they made the rule
that no one could be paged unless it was an "emergency." We got around that by
devising a code. To page someone, all we had to do was tell the operator that
someone left his lights on, describe the pagee's car, and give his
initials+extension-to-dial as the plate number.
We used it exactly once to try it out (and it worked). But we never needed to
page one another, so the no-paging-except-in-an-emergency rule was actually
quite nice.
--
Jim Thomas Principal Applications Engineer Bittware, Inc [email protected]http://www.bittware.com (603) 226-0404 x536
Nothing is ever so bad that it can't get worse. - Calvin
Oops! should have been "offal" which is pronounced pretty much the same way as
"awful."
--
Jim Thomas Principal Applications Engineer Bittware, Inc [email protected]http://www.bittware.com (603) 226-0404 x536
Nothing is ever so bad that it can't get worse. - Calvin
"Jim Thomas" <[email protected]> wrote in message
news:[email protected]..
> Eric Jacobsen wrote:
> > I think you cheated on V and Z, though.
> >
>
> Yup. Suggestions are welcome.
>
> V as in vee
> Z as in zee (or zed)
Re: As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?"?
Martin Blume wrote:
> "Richard Owlett" schrieb
>
>>As "Nyquist" is to "sample rate"
>> "????" is to "sample period/duration/width/?" ?
>
>
> As "sample rate" is "1 / (sample period)"
> "sample period" is to "1/Nyquist"
>
Well, "YES" and "NO"
This and post which mentioned connection between "sample duration (poor
word)" and "frequency resolution" which in a sense I "knew" but has yet
to become intuitive.
I think the responses not only show me how poorly stated my question
was. But give me hints on how to restate.
I'll rethink and repost, probably under a different subject line.
> This may be answer to your question, but not of much help.
>
> I think you are mixing up two domains here: the one of
> strict mathematics and signal processing and the other one
> - much fuzzier - about the human perception of hearing and
> the generation of speech. While human hearing is obviously
> based on the same mathematics and physics of acoustics, there
> are many tricks that evolution has come up with.
Two domains -- quite probably ( though not those mentioned
More like confusing to basis sets that are not related although both may
have a vector with same units of measure.
> You might want to check the "Scientist's and Engineer's Guide
> to Digital Signal Processing":
> http://www.analog.com/processors/res...brary/manuals/
> training/materials/pdf/dsp_book_frontmat.pdf
> especially chapter 22, "Audio Processing".
Downloaded Chapter 22. Sub headings relevant to my overall 'problem'.
Why do i not think my yard will get attention this weekend.]
On Tue, 21 Sep 2004 09:32:36 -0400, Jim Thomas <[email protected]>
wrote:
(snipped)
Hi Jim,
that was a fun post.
There's a radio commercial here in northern California for some sort
of mortgage company. The company name is Vitech, or Bitech, or
Ditech. I can't tell.
The commercial ends with, "That's the Vitech with a "V", or it
could be "That's the Bitech with a "B", or it
could be "That's the Ditech with a "D". Those knuckled-headed
adverticisers don't know what they're doing!!
Rick Lyons wrote:
> On Tue, 21 Sep 2004 09:32:36 -0400, Jim Thomas <[email protected]>
> wrote:
>
> (snipped)
>
> Hi Jim,
> that was a fun post.
>
> There's a radio commercial here in northern California for some sort
> of mortgage company. The company name is Vitech, or Bitech, or
> Ditech. I can't tell.
>
> The commercial ends with, "That's the Vitech with a "V", or it
> could be "That's the Bitech with a "B", or it
> could be "That's the Ditech with a "D". Those knuckled-headed
> adverticisers don't know what they're doing!!
>
> Ha ha,
>
> [-Rick-]
>
It's a Dog Ditech. Arkansas Ozark hillbillies must have better
enunciation than the left coast ( Don't hear that ad, but another ad
nauseam )
The loan ad that gets me is Wells Fargo Home Mortgage advertising their
*interest only* home loan.
Tag line is that they want you as customer for life.
They also claim their mortgages have a shorter term than competitor.
The first is problem of slogan attached to ad without thinking.
Second is actually a punctuation error. But if you are just listening,
you just go "WHAT?!!! ,/