PDA

View Full Version : Human hearing instataneous dynamic rage?


Richard Owlett
08-22-2008, 03:14 AM
Subject line probably poorly stated.
When dynamic range of human ear is discussed it's usually comparing
threshold of pain to weakest detectable sound.

I'm more interested in comparing a loud and soft sound being
distinguished at the same time. Perhaps it might have also come from
test to determine when amplifier distortion becomes detectable.

Suggested search terms?

08-22-2008, 05:01 AM
On Aug 21, 10:14 pm, Richard Owlett <[email protected]> wrote:

> I'm more interested in comparing a loud and soft sound being
> distinguished at the same time. Perhaps it might have also come from
> test to determine when amplifier distortion becomes detectable.
>
> Suggested search terms?

This may get into 'psychoacoustics' since it's as much about what can
register as anything else. Another search term would be masking.
Audio compression schemes (mp3, etc) apparently leverage some
knowledge of this to throw away components that aren't likely to be
noticed in the psychoacoustic shadow of other more obvious ones, so
you might pick up some information form reading about the innner
workings of those.

Richard Owlett
08-22-2008, 11:04 AM
[email protected] wrote:
> On Aug 21, 10:14 pm, Richard Owlett <[email protected]> wrote:
>
>
>>I'm more interested in comparing a loud and soft sound being
>>distinguished at the same time. Perhaps it might have also come from
>>test to determine when amplifier distortion becomes detectable.
>>
>>Suggested search terms?
>
>
> This may get into 'psychoacoustics' since it's as much about what can
> register as anything else.

I think that's the vein I was thinking in. In a noisy room you can focus
on conversation of interest.

> Another search term would be masking.
> Audio compression schemes (mp3, etc) apparently leverage some
> knowledge of this to throw away components that aren't likely to be
> noticed in the psychoacoustic shadow of other more obvious ones, so
> you might pick up some information form reading about the innner
> workings of those.

Now to Google.

Scott Seidman
08-22-2008, 01:31 PM
Richard Owlett <[email protected]> wrote in
news:[email protected] m:

> Subject line probably poorly stated.
> When dynamic range of human ear is discussed it's usually comparing
> threshold of pain to weakest detectable sound.
>
> I'm more interested in comparing a loud and soft sound being
> distinguished at the same time. Perhaps it might have also come from
> test to determine when amplifier distortion becomes detectable.
>
> Suggested search terms?

http://books.google.com/books?id=q-
ZeAAAACAAJ&dq=auditory+inauthor:pickles&lr=&as_brr=0

Try to interlibrary loan that.

--
Scott
Reverse name to reply

Richard Owlett
08-22-2008, 02:21 PM
Scott Seidman wrote:
> Richard Owlett <[email protected]> wrote in
> news:[email protected] m:
>
>
>>Subject line probably poorly stated.
>>When dynamic range of human ear is discussed it's usually comparing
>>threshold of pain to weakest detectable sound.
>>
>>I'm more interested in comparing a loud and soft sound being
>>distinguished at the same time. Perhaps it might have also come from
>>test to determine when amplifier distortion becomes detectable.
>>
>>Suggested search terms?
>
>
> http://books.google.com/books?id=q-
> ZeAAAACAAJ&dq=auditory+inauthor:pickles&lr=&as_brr=0
>
> Try to interlibrary loan that.
>

That appears to focus on the mechanism rather than the end result.
The suggestions to search for psychoacoustics was fruitful. The
Wikipedia article on and auditory masking seems to answer my immediate
question.

My original question was vague but Google and Wikipedia help cut out the
underbrush. I think searching for when distortion can be noticed will be
fruitful.

Thanks.

Scott Seidman
08-22-2008, 02:47 PM
Richard Owlett <[email protected]> wrote in
news:[email protected] m:

> Scott Seidman wrote:
>> Richard Owlett <[email protected]> wrote in
>> news:[email protected] m:
>>
>>
>>>Subject line probably poorly stated.
>>>When dynamic range of human ear is discussed it's usually comparing
>>>threshold of pain to weakest detectable sound.
>>>
>>>I'm more interested in comparing a loud and soft sound being
>>>distinguished at the same time. Perhaps it might have also come from
>>>test to determine when amplifier distortion becomes detectable.
>>>
>>>Suggested search terms?
>>
>>
>> http://books.google.com/books?id=q-
>> ZeAAAACAAJ&dq=auditory+inauthor:pickles&lr=&as_brr=0
>>
>> Try to interlibrary loan that.
>>
>
> That appears to focus on the mechanism rather than the end result.
> The suggestions to search for psychoacoustics was fruitful. The
> Wikipedia article on and auditory masking seems to answer my immediate
> question.
>
> My original question was vague but Google and Wikipedia help cut out
the
> underbrush. I think searching for when distortion can be noticed will
be
> fruitful.
>
> Thanks.
>
>

You'll find an awfully useful bibliography. Grab the most appropriate
references and search them forward to see who cites them. Should zero
you in fast.

--
Scott
Reverse name to reply

SteveSmith
08-22-2008, 09:51 PM
Here's what I found when doing the research for my book-- 15 years ago
Sorry,
I don't have the reference anymore. This also gives you another topic t
look under, Companding.
Steve



http://www.dspguide.com/ch22/5.htm

Richard Owlett
08-22-2008, 11:11 PM
SteveSmith wrote:
> Here's what I found when doing the research for my book-- 15 years ago.
> Sorry,
> I don't have the reference anymore. This also gives you another topic to
> look under, Companding.
> Steve
>
> http://www.dspguide.com/ch22/5.htm

Hmmmm, Are you and Mr. Avins in partnership to whack dumkoffs(sp?) up
beside head until they actually begin to *THINK* ROFL

The first paragraph of your reference was an appropriate weight 2x4 ;)
It demonstrated that solving my immediate problem had little to do with
my specific question.

HOWEVER, I'm still fascinated by the answer to my question:
1. how loud must signal B be to mask signal A?
2. what is minimum amount of distortion that is detectable.

Ben Bradley
08-23-2008, 07:28 PM
On Fri, 22 Aug 2008 17:11:24 -0500, Richard Owlett
<[email protected]> wrote:

>SteveSmith wrote:
>> Here's what I found when doing the research for my book-- 15 years ago.
>> Sorry,
>> I don't have the reference anymore. This also gives you another topic to
>> look under, Companding.
>> Steve
>>
>> http://www.dspguide.com/ch22/5.htm
>
>Hmmmm, Are you and Mr. Avins in partnership to whack dumkoffs(sp?) up
>beside head until they actually begin to *THINK* ROFL
>
>The first paragraph of your reference was an appropriate weight 2x4 ;)
>It demonstrated that solving my immediate problem had little to do with
>my specific question.
>
>HOWEVER, I'm still fascinated by the answer to my question:
> 1. how loud must signal B be to mask signal A?
> 2. what is minimum amount of distortion that is detectable.

These are indeed psychoacoustic questions, but I've seen them
discussed in relation to amplifier distortion, and especially the
"tubes vs. transistors" debate that's been going on ever since
transistor amplifiers were commercially available.
I recall that the 4th edition Radiotron Designer's Handbook (the
venerable 1000+ page vacuum-tube electronics design reference from the
1950's may have someething on this. One more resource might be the
Journal of the Audio Engineering Society (http://aes.org), or some of
their other publications.
There's also Google's Usenet archives, where I've seen this sort of
thing discussed on rec.audio.pro and perhaps also
sci.electronics.design as well as comp.dsp.
Here's a general answer that you might already know (I now look at
what I wrote below, and some may say I'm wrong about something, but
whatever, it may spur some good discussion).
The first thing to keep in mind is that "percent harmonic
distortion" figure used for amplifiers is a near-meaningless number
for audibility of distortion. It's a conglomerate figure for all
distortion, and some types of distortion are more audible than others.
Lower harmonics (say, second and third) are more easily masked by the
fundamental and have to at a higher volume to be audible than do
higher (just to pick numbers, maybe fifth and up) harmonics. Also,
when they ARE audible, lower harmonics are more "pleasing" and
"euphonic" than higher harmonics, and might be mistaken for part of
the original signal rather than percieved as distortion (so it could
be important to compare with an "undistorted" original signal, with
something like the PCABX I mention below).
The transfer characteristic of vacuum tube amplifiers is generally
in the shape of an 'S' curve, as tubes gradually go into cutoff and
saturation. Even in their "linear region" it's a slight curve, so it
can generate distortion well below the extremes. This generates mostly
lower-order harmonics. Transistor circuits tend to have transfer
curves that look like a stretched out "Z" due to their hard turnon and
turnoff characteristics, generating higher harmonics.
Also, tubes are operated at "full gain" with little or no negative
feedback (perhaps one reason is they are, or certainly were,
relatively expensive, and it takes more tubes to make up for the
lowered gain when using high negative feedbback), whereas transistor
circuits are designed for high gain and high negative feedback.
While negative feedback DOES lower distortion, it also changes the
characteristics of the distortion that remains. One interesting point
is crossover distortion in push-pull transistor amplifiers (circuits
are usually biased on enough to make this insignificant, but let's
pretend not for the moment). This is generally regarded as a very
audible and very bad sounding distortion (it's also "counterintuitive"
in that it DECREASES a percentage of the signal as the signal level
increases, the opposite of most types of distortion). Negative
feedback will reduce the percentage distortion in this case, but will
force the remaining distortion products into higher harmonics, perhaps
even making the problem MORE audible than with less negative feedback.
rec.audio.pro poster Arny Kruger used to have a "PCABX" site with
various .wav files with different amounts and types of distortion you
could download, along with software that would pick randomly between
two .wav files - you could play A, B, or the one the program picked,
X, and the idea was to determine by ear whether X was A or B. You
would push a button to indicate your decision of X=A or X=B, and the
program would then tell you if you're right. By doing several trials,
the progra determines statistically whether there's an audible
difference (to that listener!) between the two sound files.
The site pcabx.com is no longer available for reasons I don't know
(it went away while I wasn't keeping up with Usenet), but I recall he
did have a large number of .wav files, and the hosting costs may have
been significant.
But I suppose you just want to find results and not do your own
original research (especially when it surely has already been done in
this area). What's your application for this? How much time and money
are you willing to spend to answer these questions?

Richard Owlett
08-24-2008, 11:58 AM
Ben Bradley wrote:
> On Fri, 22 Aug 2008 17:11:24 -0500, Richard Owlett
> <[email protected]> wrote:
>
>
>>SteveSmith wrote:
>>
>>>Here's what I found when doing the research for my book-- 15 years ago.
>>>Sorry,
>>>I don't have the reference anymore. This also gives you another topic to
>>>look under, Companding.
>>>Steve
>>>
>>>http://www.dspguide.com/ch22/5.htm
>>
>>Hmmmm, Are you and Mr. Avins in partnership to whack dumkoffs(sp?) up
>>beside head until they actually begin to *THINK* ROFL
>>
>>The first paragraph of your reference was an appropriate weight 2x4 ;)
>>It demonstrated that solving my immediate problem had little to do with
>>my specific question.
>>
>>HOWEVER, I'm still fascinated by the answer to my question:
>> 1. how loud must signal B be to mask signal A?
>> 2. what is minimum amount of distortion that is detectable.
>
>
> These are indeed psychoacoustic questions, but I've seen them
> discussed in relation to amplifier distortion, and especially the
> "tubes vs. transistors" debate that's been going on ever since
> transistor amplifiers were commercially available.
> I recall that the 4th edition Radiotron Designer's Handbook (the
> venerable 1000+ page vacuum-tube electronics design reference from the
> 1950's may have someething on this.

Now that may be motivation to dig thru a stack of boxes I haven't really
looked at for 20 years.

> One more resource might be the
> Journal of the Audio Engineering Society (http://aes.org), or some of
> their other publications.
> There's also Google's Usenet archives, where I've seen this sort of
> thing discussed on rec.audio.pro and perhaps also
> sci.electronics.design as well as comp.dsp.
> Here's a general answer that you might already know (I now look at
> what I wrote below, and some may say I'm wrong about something, but
> whatever, it may spur some good discussion).
> The first thing to keep in mind is that "percent harmonic
> distortion" figure used for amplifiers is a near-meaningless number
> for audibility of distortion. It's a conglomerate figure for all
> distortion, and some types of distortion are more audible than others.
> Lower harmonics (say, second and third) are more easily masked by the
> fundamental and have to at a higher volume to be audible than do
> higher (just to pick numbers, maybe fifth and up) harmonics. Also,
> when they ARE audible, lower harmonics are more "pleasing" and
> "euphonic" than higher harmonics, and might be mistaken for part of
> the original signal rather than percieved as distortion (so it could
> be important to compare with an "undistorted" original signal, with
> something like the PCABX I mention below).
> The transfer characteristic of vacuum tube amplifiers is generally
> in the shape of an 'S' curve, as tubes gradually go into cutoff and
> saturation. Even in their "linear region" it's a slight curve, so it
> can generate distortion well below the extremes. This generates mostly
> lower-order harmonics. Transistor circuits tend to have transfer
> curves that look like a stretched out "Z" due to their hard turnon and
> turnoff characteristics, generating higher harmonics.
> Also, tubes are operated at "full gain" with little or no negative
> feedback (perhaps one reason is they are, or certainly were,
> relatively expensive, and it takes more tubes to make up for the
> lowered gain when using high negative feedbback), whereas transistor
> circuits are designed for high gain and high negative feedback.
> While negative feedback DOES lower distortion, it also changes the
> characteristics of the distortion that remains. One interesting point
> is crossover distortion in push-pull transistor amplifiers (circuits
> are usually biased on enough to make this insignificant, but let's
> pretend not for the moment). This is generally regarded as a very
> audible and very bad sounding distortion (it's also "counterintuitive"
> in that it DECREASES a percentage of the signal as the signal level
> increases, the opposite of most types of distortion). Negative
> feedback will reduce the percentage distortion in this case, but will
> force the remaining distortion products into higher harmonics, perhaps
> even making the problem MORE audible than with less negative feedback.
> rec.audio.pro poster Arny Kruger used to have a "PCABX" site with
> various .wav files with different amounts and types of distortion you
> could download, along with software that would pick randomly between
> two .wav files - you could play A, B, or the one the program picked,
> X, and the idea was to determine by ear whether X was A or B. You
> would push a button to indicate your decision of X=A or X=B, and the
> program would then tell you if you're right. By doing several trials,
> the progra determines statistically whether there's an audible
> difference (to that listener!) between the two sound files.
> The site pcabx.com is no longer available for reasons I don't know
> (it went away while I wasn't keeping up with Usenet), but I recall he
> did have a large number of .wav files, and the hosting costs may have
> been significant.
> But I suppose you just want to find results and not do your own
> original research (especially when it surely has already been done in
> this area). What's your application for this? How much time and money
> are you willing to spend to answer these questions?
>

This outgrowth of an offshoot of a general interest in problems related
to speech recognition. The only budget I have for this is my time and
access to the Web. My current project is representing
time/frequency/intensity of sound in 3D - the spectrograms that are
typically used just don't "work" for me.

The purpose of this round of questions was to get some idea of how to
scale the plot to be both "pleasing" and useful. My current idea is to
experiment with plotting the data on a linear scale with contours
displayed at logarithmic intervals.

jim
08-24-2008, 03:14 PM
Richard Owlett wrote:

> >
>
> This outgrowth of an offshoot of a general interest in problems related
> to speech recognition. The only budget I have for this is my time and
> access to the Web. My current project is representing
> time/frequency/intensity of sound in 3D - the spectrograms that are
> typically used just don't "work" for me.

I don't know how you define "work for me" exactly. You are interested
in making a visual display of your frequency vs time data. So the issue
is the dynamic range of your visual capabilities more than the range of
hearing.

You could display it as greyscale image where light and dark are used
to represzent magnitudes. If you did that you would be converting the
data to 8 bits, but in reality your eyes can only distinguish about 100
levels of gray at the most. So storing the data as 16 bits is way more
than you need for that type of display.
If you plot the data as a 3d surface that certainly will increase the
range of what you perceive. A dynamic range of 16 bits would mean that
if the largest feature in your display were bigger than a house then the
smallest could be smaller than a grain of sand.

-jim

>
> The purpose of this round of questions was to get some idea of how to
> scale the plot to be both "pleasing" and useful. My current idea is to
> experiment with plotting the data on a linear scale with contours
> displayed at logarithmic intervals.


----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= - Total Privacy via Encryption =---

Richard Owlett
08-24-2008, 08:45 PM
jim wrote:
>
> Richard Owlett wrote:
>
>
>>This outgrowth of an offshoot of a general interest in problems related
>>to speech recognition. The only budget I have for this is my time and
>>access to the Web. My current project is representing
>>time/frequency/intensity of sound in 3D - the spectrograms that are
>>typically used just don't "work" for me.
>
>
> I don't know how you define "work for me" exactly. You are interested
> in making a visual display of your frequency vs time data. So the issue
> is the dynamic range of your visual capabilities more than the range of
> hearing.

NO

>
> You could display it as greyscale

*NO* That's the major problem I have spectrograms.


image where light and dark are used
> to represzent magnitudes. If you did that you would be converting the
> data to 8 bits, but in reality your eyes can only distinguish about 100
> levels of gray at the most. So storing the data as 16 bits is way more
> than you need for that type of display.
> If you plot the data as a 3d surface

Who said anything about a _surface_?

> that certainly will increase the
> range of what you perceive. A dynamic range of 16 bits

So just where did "16 bits" magically come from?
Subject is "Human hearing dynamic range".

Quoting my original post:
"Subject line probably poorly stated.
When dynamic range of human ear is discussed it's usually comparing
threshold of pain to weakest detectable sound.

I'm more interested in comparing a loud and soft sound being
distinguished at the same time."

> would mean that
> if the largest feature in your display were bigger than a house then the
> smallest could be smaller than a grain of sand.
>
> -jim
>
>
>>The purpose of this round of questions was to get some idea of how to
>>scale the plot to be both "pleasing" and useful. My current idea is to
>>experiment with plotting the data on a linear scale with contours
>>displayed at logarithmic intervals.
>
>
>
> ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
> http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
> ---= - Total Privacy via Encryption =---

jim
08-24-2008, 10:25 PM
Richard Owlett wrote:
>
> jim wrote:
> >
> > Richard Owlett wrote:
> >
> >
> >>This outgrowth of an offshoot of a general interest in problems related
> >>to speech recognition. The only budget I have for this is my time and
> >>access to the Web. My current project is representing
> >>time/frequency/intensity of sound in 3D - the spectrograms that are
> >>typically used just don't "work" for me.
> >
> >
> > I don't know how you define "work for me" exactly. You are interested
> > in making a visual display of your frequency vs time data. So the issue
> > is the dynamic range of your visual capabilities more than the range of
> > hearing.
>
> NO
>
> >
> > You could display it as greyscale
>
> *NO* That's the major problem I have spectrograms.

What is?

>
> image where light and dark are used
> > to represzent magnitudes. If you did that you would be converting the
> > data to 8 bits, but in reality your eyes can only distinguish about 100
> > levels of gray at the most. So storing the data as 16 bits is way more
> > than you need for that type of display.
> > If you plot the data as a 3d surface
>
> Who said anything about a _surface_?

Yes right you said contours this time. The point remains the same 16
bits is enormous range for visualization. It's going to generally look
not much different than 8 bits.

>
> > that certainly will increase the
> > range of what you perceive. A dynamic range of 16 bits
>
> So just where did "16 bits" magically come from?
> Subject is "Human hearing dynamic range".

Same place 8 bits came from - these are standard sizes for computer
data. I assumed you are using standard computer equipment making it
extremely unlikely you will be viewing the spectrum with some sort of
display format of 5 or 11 bits or whatever. It doesn't seem like it
matter how great the dynamic range of the audio equipment if you are
viewing in 8 bits which is already more than your eyes can see.

Also speech recordings in 8 bits can be quite clear provided the sample
rate is not too slow. I have heard pretty good 4 bit recordings of human
speech.
-jim


>
> Quoting my original post:
> "Subject line probably poorly stated.
> When dynamic range of human ear is discussed it's usually comparing
> threshold of pain to weakest detectable sound.
>
> I'm more interested in comparing a loud and soft sound being
> distinguished at the same time."
>
> > would mean that
> > if the largest feature in your display were bigger than a house then the
> > smallest could be smaller than a grain of sand.
> >
> > -jim
> >
> >
> >>The purpose of this round of questions was to get some idea of how to
> >>scale the plot to be both "pleasing" and useful. My current idea is to
> >>experiment with plotting the data on a linear scale with contours
> >>displayed at logarithmic intervals.
> >
> >
> >
> > ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
> > http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
> > ---= - Total Privacy via Encryption =---


----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= - Total Privacy via Encryption =---

Richard Owlett
08-24-2008, 11:09 PM
jim wrote:
>
> Richard Owlett wrote:
>
>>jim wrote:
>>
>>>Richard Owlett wrote:
>>>
>>>
>>>
>>>>This outgrowth of an offshoot of a general interest in problems related
>>>>to speech recognition. The only budget I have for this is my time and
>>>>access to the Web. My current project is representing
>>>>time/frequency/intensity of sound in 3D - the spectrograms that are
>>>>typically used just don't "work" for me.
>>>
>>>
>>> I don't know how you define "work for me" exactly. You are interested
>>>in making a visual display of your frequency vs time data. So the issue
>>>is the dynamic range of your visual capabilities more than the range of
>>>hearing.
>>
>>NO
>>
>>
>>> You could display it as greyscale
>>
>>*NO* That's the major problem I have spectrograms.
>
>
> What is?


*GREYSCALE DISPLAY!!!!!*

>
>
>> image where light and dark are used
>>
>>>to represzent magnitudes. If you did that you would be converting the
>>>data to 8 bits, but in reality your eyes can only distinguish about 100
>>>levels of gray at the most. So storing the data as 16 bits is way more
>>>than you need for that type of display.
>>> If you plot the data as a 3d surface
>>
>>Who said anything about a _surface_?
>
>
> Yes right you said contours this time. The point remains the same 16
> bits is enormous range for visualization. It's going to generally look
> not much different than 8 bits.

You keep reading in things that aren't there.
Or perhaps you come equipped with a bionic ear.
The subject is *human resolution* *NOT* machine representation!

>
>
>>>that certainly will increase the
>>>range of what you perceive. A dynamic range of 16 bits
>>
>>So just where did "16 bits" magically come from?
>>Subject is "Human hearing dynamic range".
>
>
> Same place 8 bits came from - these are standard sizes for computer
> data. I assumed you are using standard computer equipment making it
> extremely unlikely you will be viewing the spectrum with some sort of
> display format of 5 or 11 bits or whatever. It doesn't seem like it
> matter how great the dynamic range of the audio equipment if you are
> viewing in 8 bits which is already more than your eyes can see.
>
> Also speech recordings in 8 bits can be quite clear provided the sample
> rate is not too slow. I have heard pretty good 4 bit recordings of human
> speech.
> -jim
>
>
>
>>Quoting my original post:
>>"Subject line probably poorly stated.
>>When dynamic range of human ear is discussed it's usually comparing
>>threshold of pain to weakest detectable sound.
>>
>>I'm more interested in comparing a loud and soft sound being
>>distinguished at the same time."
>>
>>
>>>would mean that
>>>if the largest feature in your display were bigger than a house then the
>>>smallest could be smaller than a grain of sand.
>>>
>>>-jim
>>>
>>>
>>>
>>>>The purpose of this round of questions was to get some idea of how to
>>>>scale the plot to be both "pleasing" and useful. My current idea is to
>>>>experiment with plotting the data on a linear scale with contours
>>>>displayed at logarithmic intervals.
>>>
>>>
>>>
>>>----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
>>>http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
>>>---= - Total Privacy via Encryption =---
>
>
>
> ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
> http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
> ---= - Total Privacy via Encryption =---

Ben Bradley
08-25-2008, 12:55 AM
On Sun, 24 Aug 2008 05:58:25 -0500, Richard Owlett
<[email protected]> wrote:

> ...

>This outgrowth of an offshoot of a general interest in problems related
>to speech recognition.

Speech recognition by machine, or by human?

> The only budget I have for this is my time and
>access to the Web. My current project is representing
>time/frequency/intensity of sound in 3D - the spectrograms that are
>typically used just don't "work" for me.

The common name for this (if for some strange reason you haven't
heard it) is a waterfall plot. There should be plenty of info on that.
IIRC, Microsoft Excel can create such a plot.

>
>The purpose of this round of questions was to get some idea of how to
>scale the plot to be both "pleasing" and useful. My current idea is to
>experiment with plotting the data on a linear scale with contours
>displayed at logarithmic intervals.

Richard Owlett
08-25-2008, 01:37 AM
Ben Bradley wrote:
> On Sun, 24 Aug 2008 05:58:25 -0500, Richard Owlett
> <[email protected]> wrote:
>
>
>>...
>
>
>>This outgrowth of an offshoot of a general interest in problems related
>>to speech recognition.
>
>
> Speech recognition by machine, or by human?

The fascination covers both. Back in the early I took an introductory
linguistics course. The class was primarily Yanks. There were two guys
from Dixie. The instructor had them say "pin" and "pen". They and the
instructor were only ones in room who could distinguish the difference.

My current interest is the signal path (including physical environment)
from vocal tract to data bus. I don't get into the semantic decoding at all.

>
>
>>The only budget I have for this is my time and
>>access to the Web. My current project is representing
>>time/frequency/intensity of sound in 3D - the spectrograms that are
>>typically used just don't "work" for me.
>
>
> The common name for this (if for some strange reason you haven't
> heard it) is a waterfall plot. There should be plenty of info on that.
> IIRC, Microsoft Excel can create such a plot.

Actually that's where I started. I spent hour in front of a RF spectrum
analyzer in 70's. It's now the only way I think of a spectrum.

Waterfall plots I've seen don't allow rotating to view whichever feature
catches my interest. I use Scilab's param3d1(). It allow plotting with
points rather than lines. I've been experimenting with using contour()
in conjunction with it. The result are contours of equal amplitude
hanging in 3D space. It has the advantage that I can look at it in 3D
while someone used to spectrograms can rotate it and look down on the
time-frequency plane and see a color spectrogram.

I normalize my data to max of all the FFT's in that experiment. That
makes the largest features clear when plotted on a linear scale.
Plotting to a log scale makes the small features also visible at the
cost of *CLUTTER*. The purpose of my question was to try to come up a
threshold below which not to plot a value. Scilab allows setting a
variable to %nan ("Not A Number") and all plot routines will ignore it.
If it is an element of vector is %nan, all calculations using that
element are set to %nan with out causing errors.

>
>
>>The purpose of this round of questions was to get some idea of how to
>>scale the plot to be both "pleasing" and useful. My current idea is to
>>experiment with plotting the data on a linear scale with contours
>>displayed at logarithmic intervals.
>
>

jim
08-25-2008, 01:49 AM
Richard Owlett wrote:

> >>> If you plot the data as a 3d surface
> >>
> >>Who said anything about a _surface_?
> >
> >
> > Yes right you said contours this time. The point remains the same 16
> > bits is enormous range for visualization. It's going to generally look
> > not much different than 8 bits.
>
> You keep reading in things that aren't there.
> Or perhaps you come equipped with a bionic ear.
> The subject is *human resolution* *NOT* machine representation!
>
>

The subject line didn't make any sense to me.
The body of the post I responded to asked specifically
about machine representation.

Here is what I read that I was responding to:

My current project is representing
time/frequency/intensity of sound in 3D -
My current idea is to
experiment with plotting the data on a linear scale with contours
displayed at logarithmic intervals.

I assumed the representation to which you referred would be done on a computer.
Are you saying that isn't true?

-jim


----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= - Total Privacy via Encryption =---

Ben Bradley
08-28-2008, 05:05 AM
On Sun, 24 Aug 2008 14:45:10 -0500, Richard Owlett
<[email protected]> wrote:

>jim wrote:
>>
>> Richard Owlett wrote:
>>
>>
>>>This outgrowth of an offshoot of a general interest in problems related
>>>to speech recognition. The only budget I have for this is my time and
>>>access to the Web. My current project is representing
>>>time/frequency/intensity of sound in 3D - the spectrograms that are
>>>typically used just don't "work" for me.
>>
>>
>> I don't know how you define "work for me" exactly. You are interested
>> in making a visual display of your frequency vs time data. So the issue
>> is the dynamic range of your visual capabilities more than the range of
>> hearing.
>
>NO
>
>>
>> You could display it as greyscale
>
>*NO* That's the major problem I have spectrograms.

Okay, you could display each succesive increase in volume as a
different color, perhaps using the standard resistor color code:

0 black
1 brown
2 red
3 orange
4 yellow
5 blue
6 green
7 violet
8 gray (or grey for the other side of the pond)
9 white

And for values of 10 or above it would just "wrap around." This
would make it easy to distinguish between adjacent levels, but you'll
misinterpret things when levels change in larger steps between
adjacent FFT bins such as 27, 38, 54.

You're looking at frequency vs. amplitude vs. time. I suspect one
reason you're not seeing what you want is the length of the FFT. If
it's long then it will smear higher frequency transients. Now that I
think about it, it will smear all transients. The FFT displays things
as if they were steady-state signals present for the whole duration of
the window.

Another thing is the window used for the FFT. I recall that
different windowing functions optimize for different things (for
example, more accurate amplitude measurent vs. more accurate frequency
measurement). You apparently want to optimize distinguishing between
different frequencies (have a sharper slope for a displayed frequency,
I forget what that's called, perhaps a "steeper skirt"). I forget what
windows do what, but choosing the right window can make a dramatic
difference over the wrong one.
>
>
> image where light and dark are used
>> to represzent magnitudes. If you did that you would be converting the
>> data to 8 bits, but in reality your eyes can only distinguish about 100
>> levels of gray at the most. So storing the data as 16 bits is way more
>> than you need for that type of display.
>> If you plot the data as a 3d surface
>
>Who said anything about a _surface_?

It's a 3d image, so it effectively has a surface. But then I'm not
sure if you want color or vertical height to represent amplitude or
what.

>
>> that certainly will increase the
>> range of what you perceive. A dynamic range of 16 bits
>
>So just where did "16 bits" magically come from?
>Subject is "Human hearing dynamic range".

Try to work with us, both Jim and I are trying to help you, and
you're being a bit cantankerous.

Over what range of values would you be displaying for amplitude?
20? 500? 50,000?

>
>Quoting my original post:
>"Subject line probably poorly stated.
>When dynamic range of human ear is discussed it's usually comparing
>threshold of pain to weakest detectable sound.
>
>I'm more interested in comparing a loud and soft sound being
>distinguished at the same time."

And especially that these different sounds might be near in
frequency and one much louder than the other, the FFT length and
windowing function are critically important.

Richard Owlett
08-28-2008, 04:00 PM
caveat lector
lingua in letifico ;)

Engineers read? I doubt.
Let's see if they can be lead on a parsing trail, even if English ~BNF.

Lets parse the original subject line, "Human hearing instataneous
dynamic rage?"

The first word is "human". That can be used as a noun or an adjective.
The second is "hearing". That can be used as a noun or an adjective.
The third is "instataneous". Missing from dictionary but resembles
"instantaneous", an adjective.
The fourth is "dynamic". That is an adjective.
The last is "rage". That's a noun but why use "rage" in an on-topic DSP
post. Body of post refers to "range". Another typo.


Parse on.
In English, four adjectives modifying one noun - unlikely.
Noun Noun would be strange/awkward.
Adjective Noun Adjective Adjective Noun seems a likely construct.

As this is comp.dsp, with with many audio types lurking, it's unlikely
that "hearing" is a law reference. The primary topic evidently concerns
how humans hear.

Now to the second phrase. "Dynamic range" is a common term and meaning
seems clear. But it is modified by "instantaneous". "Instantaneous" and
"dynamic" just aren't commonly used together. Red flag raised.

The point is explicitly clarified in the first sentence of the second
paragraph by contrasting
"I'm more interested in comparing a loud and soft sound being
distinguished at the same time."
to "dynamic range of human ear" being comparison of "threshold of pain"
to "weakest detectable sound".


I was offered key words psychoacoustics, lossy compression methods, and
masking. These proved useful for Google and Wikipedia searches. I also
was given some historical background on measurement/reception of
distortion in audio systems which brought to mind things in my general
background. The result is that I now know that what I'm looking for will
be under a heading related to "audio masking". I suspect the number I'm
looking for will be in vicinity of 20-30 dB.


I was _THEN_ asked the purpose of my question.
It is to devise a scaling procedure for a *3D* representation of
intensity vs frequency vs time. I commented, as an aside, that I had
found *2D* representations (aka spectrograms ) unsatisfactory.

So I was then hit with methods of possibly improving 2D methods in which
I have no reason to be interested. Down hill from there.












Ben Bradley wrote:
> [snip OT discussion of displaying in 2D]
>
> You're looking at frequency vs. amplitude vs. time. I suspect one
> reason you're not seeing what you want is the length of the FFT. If
> it's long then it will smear higher frequency transients. Now that I
> think about it, it will smear all transients. The FFT displays things
> as if they were steady-state signals present for the whole duration of
> the window.

NO

My FFT's (NOTE BENE the plural) cover up to tens of seconds.
Currently I'm using 10 mSec windows.

Why I don't see features is *STRICTLY* _AND_ *EXPLICITLY* a
representation issue.

There a large items.
There are small items.
I want to see details of each.

Now a foot high object on a mountain may not be significant.
But a foot deep hole in your front walk may be.
I want to see both in a single display.

The typical approach is a log plot.
Not too bad for large features.
Small features can be seen.
*BUT* irrelevantly small features also become *CLUTTER*

My input data may be 16 bit PCM, but my calculations are done in
floating point with at least a 10^16 dynamic range. Obviously I can
discard any points that are 2^16 smaller than my largest.

The question then becomes "does the system being investigated raise the
smallest significant value more?"



> [snip]
>
> Try to work with us, both Jim and I are trying to help you, and
> you're being a bit cantankerous.
>
> Over what range of values would you be displaying for amplitude?
> 20? 500? 50,000?

See above ;)

Signed
The not-so-cantankerous OP

Jerry Avins
08-28-2008, 05:08 PM
Richard Owlett wrote:

...

> The fourth is "dynamic". That is an adjective.

Parse "There was a favorable group dynamic."

...

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

jim
08-28-2008, 05:33 PM
Richard Owlett wrote:

>
> So I was then hit with methods of possibly improving 2D methods in which
> I have no reason to be interested. Down hill from there.

Ha Ha Ha you are a hoot. I was making a statement about the reasons why 2d
representations are limited. You incorrectly interpret my statement as
suggesting "methods of possibly improving 2D methods" and then wonder why the
discussion falls apart.


>

> >
> > You're looking at frequency vs. amplitude vs. time. I suspect one
> > reason you're not seeing what you want is the length of the FFT. If
> > it's long then it will smear higher frequency transients. Now that I
> > think about it, it will smear all transients. The FFT displays things
> > as if they were steady-state signals present for the whole duration of
> > the window.
>
> NO
>
> My FFT's (NOTE BENE the plural) cover up to tens of seconds.
> Currently I'm using 10 mSec windows.
>
> Why I don't see features is *STRICTLY* _AND_ *EXPLICITLY* a
> representation issue.

So who appointed you to be GOD. There is absolutely no possibility that he is
right and you are wrong.


>
> There a large items.
> There are small items.
> I want to see details of each.


>
> Now a foot high object on a mountain may not be significant.
> But a foot deep hole in your front walk may be.
> I want to see both in a single display.
>
> The typical approach is a log plot.
> Not too bad for large features.
> Small features can be seen.
> *BUT* irrelevantly small features also become *CLUTTER*
>
> My input data may be 16 bit PCM, but my calculations are done in
> floating point with at least a 10^16 dynamic range. Obviously I can
> discard any points that are 2^16 smaller than my largest.
>
> The question then becomes "does the system being investigated raise the
> smallest significant value more?"

Maybe. But the question is abstract to the point of being essentially worthless.
Is the question you are trying to ask -> Can the speech signal be quantized down
to some number of bits and still be clear and understandable? Or are you asking
something else?

-jim


----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= - Total Privacy via Encryption =---

Richard Owlett
08-29-2008, 04:34 AM
Jerry Avins wrote:

> Richard Owlett wrote:
>
> ...
>
>> The fourth is "dynamic". That is an adjective.
>
>
> Parse "There was a favorable group dynamic."
>
> ...
>
> Jerry


OYEZ OYEZ OYEZ
Oy vey
Hoist by own petard.
'Vat can I say ;/

But I'll ask anyway,
"Do you disagree with the gist of my post?"

Then again, will butt of multi-lingual pun realize same?

Richard Owlett
08-29-2008, 05:27 AM
I don't know what problem you are trying to solve so I will
suggest on how to perfect your solution.


jim wrote:

>
> Richard Owlett wrote:
>
>
>>So I was then hit with methods of possibly improving 2D methods in which
>>I have no reason to be interested. Down hill from there.
>
>
> Ha Ha Ha you are a hoot.

Careful twerp.
I'll be one to pun on my name.

> I was making a statement about the reasons why 2d
> representations are limited. You incorrectly interpret my statement as
> suggesting "methods of possibly improving 2D methods" and then wonder why the
> discussion falls apart.

Nae, I had already stated that 2D was unsuitable,
I, *ERRONEOUSLY*, assumed [parseable in interesting ways;] that you were
trying to be useful.

You repeatedly said that "I should ..."
LOL ROFL


>
>>> You're looking at frequency vs. amplitude vs. time. I suspect one
>>>reason you're not seeing what you want is the length of the FFT. If
>>>it's long then it will smear higher frequency transients. Now that I
>>>think about it, it will smear all transients. The FFT displays things
>>>as if they were steady-state signals present for the whole duration of
>>>the window.
>>
>>NO
>>
>>My FFT's (NOTE BENE the plural) cover up to tens of seconds.
>>Currently I'm using 10 mSec windows.
>>
>>Why I don't see features is *STRICTLY* _AND_ *EXPLICITLY* a
>>representation issue.
>
>
> So who appointed you to be GOD.

Your reading failure.
The data was there. I reported problems *observing* it.

Re-read rest of your own post.
THEN
engage *BRAIN* before posting

Jerry Avins
08-29-2008, 05:49 AM
Richard Owlett wrote:
> Jerry Avins wrote:
>
>> Richard Owlett wrote:
>>
>> ...
>>
>>> The fourth is "dynamic". That is an adjective.
>>
>>
>> Parse "There was a favorable group dynamic."
>>
>> ...
>>
>> Jerry
>
>
> OYEZ OYEZ OYEZ
> Oy vey
> Hoist by own petard.
> 'Vat can I say ;/
>
> But I'll ask anyway,
> "Do you disagree with the gist of my post?"
>
> Then again, will butt of multi-lingual pun realize same?

What was its aim?

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

jim
08-29-2008, 09:55 PM
Richard Owlett wrote:

>
> You repeatedly said that "I should ..."

No, you repeatedly misrepresent what others have written.

-jim


----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= - Total Privacy via Encryption =---