PDA

View Full Version : "Discrete Hartley transform" vs " Discrete Fouuier transform"


Richard Owlett
10-10-2008, 02:02 AM
I'm interested in Magnitude vs Frequency.
I have a strictly Real input sequence.
There's Fast Hartley transform code available for my preferred language.
Relative speed is not an issue as I'm working on stored data.

What gotcha's should I watch out for?
Is the a good comparison on the web of their practical differences?

TIA

julius
10-10-2008, 05:12 PM
On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
> I'm interested in Magnitude vs Frequency.
> I have a strictly Real input sequence.
> There's Fast Hartley transform code available for my preferred language.
> Relative speed is not an issue as I'm working on stored data.
>
> What gotcha's should I watch out for?
> Is the a good comparison on the web of their practical differences?
>
> TIA

Richard, maybe it's better if you tell us what your intended
application
is. Right now it sounds a bit like asking whether you should buy a
sedan
or a hatchback :-P. There will be lots of opinions, but they are
worthless
depending on your situation and application.

Richard Owlett
10-10-2008, 05:54 PM
julius wrote:
> On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
>
>>I'm interested in Magnitude vs Frequency.
>>I have a strictly Real input sequence.
>>There's Fast Hartley transform code available for my preferred language.
>>Relative speed is not an issue as I'm working on stored data.
>>
>>What gotcha's should I watch out for?
>>Is the a good comparison on the web of their practical differences?
>>
>>TIA
>
>
> Richard, maybe it's better if you tell us what your intended
> application
> is. Right now it sounds a bit like asking whether you should buy a
> sedan
> or a hatchback :-P. There will be lots of opinions, but they are
> worthless
> depending on your situation and application.

Well, my app is about as simple as it gets.
I have a time sequence f(t) and I want a measure of abs(f(omega))
Signal so far has been audio, but don't think that bears on question.
I'm not going to filter/process/... the transformed data, just plot it.
I have been using an FFT so far.
I wish to change hardware/OS/etc.
There is existing FHT code that I can cut and paste.
Wikipedia article seemed to imply FHT lost favor to FFT when faster FFT
algorithms were developed.

Eric Jacobsen
10-10-2008, 06:15 PM
On Fri, 10 Oct 2008 11:54:52 -0500, Richard Owlett
<[email protected]> wrote:

>julius wrote:
>> On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
>>
>>>I'm interested in Magnitude vs Frequency.
>>>I have a strictly Real input sequence.
>>>There's Fast Hartley transform code available for my preferred language.
>>>Relative speed is not an issue as I'm working on stored data.
>>>
>>>What gotcha's should I watch out for?
>>>Is the a good comparison on the web of their practical differences?
>>>
>>>TIA
>>
>>
>> Richard, maybe it's better if you tell us what your intended
>> application
>> is. Right now it sounds a bit like asking whether you should buy a
>> sedan
>> or a hatchback :-P. There will be lots of opinions, but they are
>> worthless
>> depending on your situation and application.
>
>Well, my app is about as simple as it gets.
>I have a time sequence f(t) and I want a measure of abs(f(omega))
>Signal so far has been audio, but don't think that bears on question.
>I'm not going to filter/process/... the transformed data, just plot it.
>I have been using an FFT so far.
>I wish to change hardware/OS/etc.
>There is existing FHT code that I can cut and paste.
>Wikipedia article seemed to imply FHT lost favor to FFT when faster FFT
>algorithms were developed.

Back when processing power and memory were hard to come by I used to
use FHTs a lot. As you mention, they're very good when the input is
real-valued, and if you only need the magnitude output for a plot
it'll be pretty efficient.

Since processing power and memory are cheap these days everybody just
plugs in FFTs. When computing or memory resources are hard to come
by, though, the FHT can have some real advantages.

There's a book called "The Hartley Transform" by Bracewell that is a
good reference if you can find it or need it. The output symmetries
of the FHT can be non-obvious at first, but once you sort them out it
is very easy to use and functionally equivalent to the FFT.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Randy Yates
10-10-2008, 06:37 PM
Eric Jacobsen <[email protected]> writes:

> On Fri, 10 Oct 2008 11:54:52 -0500, Richard Owlett
> <[email protected]> wrote:
>
>>julius wrote:
>>> On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
>>>
>>>>I'm interested in Magnitude vs Frequency.
>>>>I have a strictly Real input sequence.
>>>>There's Fast Hartley transform code available for my preferred language.
>>>>Relative speed is not an issue as I'm working on stored data.
>>>>
>>>>What gotcha's should I watch out for?
>>>>Is the a good comparison on the web of their practical differences?
>>>>
>>>>TIA
>>>
>>>
>>> Richard, maybe it's better if you tell us what your intended
>>> application
>>> is. Right now it sounds a bit like asking whether you should buy a
>>> sedan
>>> or a hatchback :-P. There will be lots of opinions, but they are
>>> worthless
>>> depending on your situation and application.
>>
>>Well, my app is about as simple as it gets.
>>I have a time sequence f(t) and I want a measure of abs(f(omega))
>>Signal so far has been audio, but don't think that bears on question.
>>I'm not going to filter/process/... the transformed data, just plot it.
>>I have been using an FFT so far.
>>I wish to change hardware/OS/etc.
>>There is existing FHT code that I can cut and paste.
>>Wikipedia article seemed to imply FHT lost favor to FFT when faster FFT
>>algorithms were developed.
>
> Back when processing power and memory were hard to come by I used to
> use FHTs a lot. As you mention, they're very good when the input is
> real-valued, and if you only need the magnitude output for a plot
> it'll be pretty efficient.
>
> Since processing power and memory are cheap these days everybody just
> plugs in FFTs. When computing or memory resources are hard to come
> by, though, the FHT can have some real advantages.

I thought that, when using the trick of an N-point FFT to get the
transform of 2 N-point real sequences, the FHT has no computational
advantage whatsoever over the FFT? But to answer Richard's question,
there's nothing wrong or disadvantageous in using it.
--
% Randy Yates % "The dreamer, the unwoken fool -
%% Fuquay-Varina, NC % in dreams, no pain will kiss the brow..."
%%% 919-577-9882 %
%%%% <[email protected]> % 'Eldorado Overture', *Eldorado*, ELO
http://www.digitalsignallabs.com

Eric Jacobsen
10-10-2008, 08:42 PM
On Fri, 10 Oct 2008 13:37:42 -0400, Randy Yates <[email protected]>
wrote:

>Eric Jacobsen <[email protected]> writes:
>
>> On Fri, 10 Oct 2008 11:54:52 -0500, Richard Owlett
>> <[email protected]> wrote:
>>
>>>julius wrote:
>>>> On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
>>>>
>>>>>I'm interested in Magnitude vs Frequency.
>>>>>I have a strictly Real input sequence.
>>>>>There's Fast Hartley transform code available for my preferred language.
>>>>>Relative speed is not an issue as I'm working on stored data.
>>>>>
>>>>>What gotcha's should I watch out for?
>>>>>Is the a good comparison on the web of their practical differences?
>>>>>
>>>>>TIA
>>>>
>>>>
>>>> Richard, maybe it's better if you tell us what your intended
>>>> application
>>>> is. Right now it sounds a bit like asking whether you should buy a
>>>> sedan
>>>> or a hatchback :-P. There will be lots of opinions, but they are
>>>> worthless
>>>> depending on your situation and application.
>>>
>>>Well, my app is about as simple as it gets.
>>>I have a time sequence f(t) and I want a measure of abs(f(omega))
>>>Signal so far has been audio, but don't think that bears on question.
>>>I'm not going to filter/process/... the transformed data, just plot it.
>>>I have been using an FFT so far.
>>>I wish to change hardware/OS/etc.
>>>There is existing FHT code that I can cut and paste.
>>>Wikipedia article seemed to imply FHT lost favor to FFT when faster FFT
>>>algorithms were developed.
>>
>> Back when processing power and memory were hard to come by I used to
>> use FHTs a lot. As you mention, they're very good when the input is
>> real-valued, and if you only need the magnitude output for a plot
>> it'll be pretty efficient.
>>
>> Since processing power and memory are cheap these days everybody just
>> plugs in FFTs. When computing or memory resources are hard to come
>> by, though, the FHT can have some real advantages.
>
>I thought that, when using the trick of an N-point FFT to get the
>transform of 2 N-point real sequences, the FHT has no computational
>advantage whatsoever over the FFT? But to answer Richard's question,
>there's nothing wrong or disadvantageous in using it.

I think it depends on the details of the resource issues. The FHT is
nice from a memory perspective if the input is real-valued since you
don't have to do any buffer manipulation, whereas with an N/2-pt FFT
you may need to fiddle with the appropriate buffering/addressing to
arrange the input properly. Similar issue with the output.

The output of the FHT can be problematic if you don't optimise for it,
but, again, if you just want a magnitude spectrum or something it's
pretty easy to pull it out of the native output vector, especially if
a dual-port is used for the output buffer.

In my experience the real advantage of the FHT was the memory
requirement, which can be substantially smaller than an FFT depending
on the constraints and what you're doing. Since memory is seldom a
constraint these days it's pretty moot.

Another potential memory saving is in the twidlle factors, since the
FHT has a single reference function [cas()], instead of sin() and
cos(). Clearly it just depends on how you're trying to get it
implemented. These sorts of things only come into play if your'e
really tight on certain resources, but it can make a big difference in
the right circumstances.

I'm not surprised that the FHT (and other things like it) are largely
unknown since canned FFTs and the like are pretty easy to plug in.


Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Richard Owlett
10-10-2008, 08:48 PM
Eric Jacobsen wrote:
> On Fri, 10 Oct 2008 13:37:42 -0400, Randy Yates <[email protected]>
> wrote:
>
>
>>Eric Jacobsen <[email protected]> writes:
>>
>>
>>>On Fri, 10 Oct 2008 11:54:52 -0500, Richard Owlett
>>><[email protected]> wrote:
>>>
>>>
>>>>julius wrote:
>>>>
>>>>>On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
>>>>>
>>>>>
>>>>>>I'm interested in Magnitude vs Frequency.
>>>>>>I have a strictly Real input sequence.
>>>>>>There's Fast Hartley transform code available for my preferred language.
>>>>>>Relative speed is not an issue as I'm working on stored data.
>>>>>>
>>>>>>What gotcha's should I watch out for?
>>>>>>Is the a good comparison on the web of their practical differences?
>>>>>>
>>>>>>TIA
>>>>>
>>>>>
>>>>>Richard, maybe it's better if you tell us what your intended
>>>>>application
>>>>>is. Right now it sounds a bit like asking whether you should buy a
>>>>>sedan
>>>>>or a hatchback :-P. There will be lots of opinions, but they are
>>>>>worthless
>>>>>depending on your situation and application.
>>>>
>>>>Well, my app is about as simple as it gets.
>>>>I have a time sequence f(t) and I want a measure of abs(f(omega))
>>>>Signal so far has been audio, but don't think that bears on question.
>>>>I'm not going to filter/process/... the transformed data, just plot it.
>>>>I have been using an FFT so far.
>>>>I wish to change hardware/OS/etc.
>>>>There is existing FHT code that I can cut and paste.
>>>>Wikipedia article seemed to imply FHT lost favor to FFT when faster FFT
>>>>algorithms were developed.
>>>
>>>Back when processing power and memory were hard to come by I used to
>>>use FHTs a lot. As you mention, they're very good when the input is
>>>real-valued, and if you only need the magnitude output for a plot
>>>it'll be pretty efficient.
>>>
>>>Since processing power and memory are cheap these days everybody just
>>>plugs in FFTs. When computing or memory resources are hard to come
>>>by, though, the FHT can have some real advantages.
>>
>>I thought that, when using the trick of an N-point FFT to get the
>>transform of 2 N-point real sequences, the FHT has no computational
>>advantage whatsoever over the FFT? But to answer Richard's question,
>>there's nothing wrong or disadvantageous in using it.
>
>
> I think it depends on the details of the resource issues. The FHT is
> nice from a memory perspective if the input is real-valued since you
> don't have to do any buffer manipulation, whereas with an N/2-pt FFT
> you may need to fiddle with the appropriate buffering/addressing to
> arrange the input properly. Similar issue with the output.
>
> The output of the FHT can be problematic if you don't optimise for it,
> but, again, if you just want a magnitude spectrum or something it's
> pretty easy to pull it out of the native output vector, especially if
> a dual-port is used for the output buffer.
>
> In my experience the real advantage of the FHT was the memory
> requirement, which can be substantially smaller than an FFT depending
> on the constraints and what you're doing. Since memory is seldom a
> constraint these days it's pretty moot.
>
> Another potential memory saving is in the twidlle factors, since the
> FHT has a single reference function [cas()], instead of sin() and
> cos(). Clearly it just depends on how you're trying to get it
> implemented. These sorts of things only come into play if your'e
> really tight on certain resources, but it can make a big difference in
> the right circumstances.
>
> I'm not surprised that the FHT (and other things like it) are largely
> unknown since canned FFTs and the like are pretty easy to plug in.
>

Canned is key feature. It's there for cut and pasting. There is a canned
FFT, but it attempts to be all things to all people and the particular
FHT was written when resources were scarcer.

Thanks folks.


>
> Eric Jacobsen
> Minister of Algorithms
> Abineau Communications
> http://www.ericjacobsen.org
>
> Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

VelociChicken
10-13-2008, 03:53 PM
"Richard Owlett" <[email protected]> wrote in message
news:[email protected] m...
> Eric Jacobsen wrote:
>> On Fri, 10 Oct 2008 13:37:42 -0400, Randy Yates <[email protected]>
>> wrote:
>>
>>
>>>Eric Jacobsen <[email protected]> writes:
>>>
>>>
>>>>On Fri, 10 Oct 2008 11:54:52 -0500, Richard Owlett
>>>><[email protected]> wrote:
>>>>
>>>>
>>>>>julius wrote:
>>>>>
>>>>>>On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>>I'm interested in Magnitude vs Frequency.
>>>>>>>I have a strictly Real input sequence.
>>>>>>>There's Fast Hartley transform code available for my preferred
>>>>>>>language.
>>>>>>>Relative speed is not an issue as I'm working on stored data.
>>>>>>>
>>>>>>>What gotcha's should I watch out for?
>>>>>>>Is the a good comparison on the web of their practical differences?
>>>>>>>
>>>>>>>TIA
>>>>>>
>>>>>>
>>>>>>Richard, maybe it's better if you tell us what your intended
>>>>>>application
>>>>>>is. Right now it sounds a bit like asking whether you should buy a
>>>>>>sedan
>>>>>>or a hatchback :-P. There will be lots of opinions, but they are
>>>>>>worthless
>>>>>>depending on your situation and application.
>>>>>
>>>>>Well, my app is about as simple as it gets.
>>>>>I have a time sequence f(t) and I want a measure of abs(f(omega))
>>>>>Signal so far has been audio, but don't think that bears on question.
>>>>>I'm not going to filter/process/... the transformed data, just plot it.
>>>>>I have been using an FFT so far.
>>>>>I wish to change hardware/OS/etc.
>>>>>There is existing FHT code that I can cut and paste.
>>>>>Wikipedia article seemed to imply FHT lost favor to FFT when faster FFT
>>>>>algorithms were developed.
>>>>
>>>>Back when processing power and memory were hard to come by I used to
>>>>use FHTs a lot. As you mention, they're very good when the input is
>>>>real-valued, and if you only need the magnitude output for a plot
>>>>it'll be pretty efficient.
>>>>
>>>>Since processing power and memory are cheap these days everybody just
>>>>plugs in FFTs. When computing or memory resources are hard to come
>>>>by, though, the FHT can have some real advantages.
>>>
>>>I thought that, when using the trick of an N-point FFT to get the
>>>transform of 2 N-point real sequences, the FHT has no computational
>>>advantage whatsoever over the FFT? But to answer Richard's question,
>>>there's nothing wrong or disadvantageous in using it.
>>
>>
>> I think it depends on the details of the resource issues. The FHT is
>> nice from a memory perspective if the input is real-valued since you
>> don't have to do any buffer manipulation, whereas with an N/2-pt FFT
>> you may need to fiddle with the appropriate buffering/addressing to
>> arrange the input properly. Similar issue with the output.
>>
>> The output of the FHT can be problematic if you don't optimise for it,
>> but, again, if you just want a magnitude spectrum or something it's
>> pretty easy to pull it out of the native output vector, especially if
>> a dual-port is used for the output buffer.
>>
>> In my experience the real advantage of the FHT was the memory
>> requirement, which can be substantially smaller than an FFT depending
>> on the constraints and what you're doing. Since memory is seldom a
>> constraint these days it's pretty moot.
>>
>> Another potential memory saving is in the twidlle factors, since the
>> FHT has a single reference function [cas()], instead of sin() and
>> cos(). Clearly it just depends on how you're trying to get it
>> implemented. These sorts of things only come into play if your'e
>> really tight on certain resources, but it can make a big difference in
>> the right circumstances.
>>
>> I'm not surprised that the FHT (and other things like it) are largely
>> unknown since canned FFTs and the like are pretty easy to plug in.
>>
>
> Canned is key feature. It's there for cut and pasting. There is a canned
> FFT, but it attempts to be all things to all people and the particular FHT
> was written when resources were scarcer.
>
> Thanks folks.
>
>
>>
>> Eric Jacobsen
>> Minister of Algorithms
>> Abineau Communications
>> http://www.ericjacobsen.org

Resources are always scarce, and of course the faster you do something, the
more of it you can do! Give your software the edge, especially in audio DSP.
People should stop thinking like Micro$oft - the "computers are faster, so
we'll make Windoze slower" mentality! ; )

Despite the arguments on theory, I found FHT to be 50% faster than any FFT
code on the internet, so I use it. It's probably the cos/sin approximations
and that it uses half the memory, which helps the data cache and prep time.

Cheers,
Dave

Eric Jacobsen
10-13-2008, 11:08 PM
On Mon, 13 Oct 2008 15:53:19 +0100, "VelociChicken" <[email protected]>
wrote:

>
>"Richard Owlett" <[email protected]> wrote in message
>news:[email protected] m...
>> Eric Jacobsen wrote:
>>> On Fri, 10 Oct 2008 13:37:42 -0400, Randy Yates <[email protected]>
>>> wrote:
>>>
>>>
>>>>Eric Jacobsen <[email protected]> writes:
>>>>
>>>>
>>>>>On Fri, 10 Oct 2008 11:54:52 -0500, Richard Owlett
>>>>><[email protected]> wrote:
>>>>>
>>>>>
>>>>>>julius wrote:
>>>>>>
>>>>>>>On Oct 9, 9:02 pm, Richard Owlett <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>I'm interested in Magnitude vs Frequency.
>>>>>>>>I have a strictly Real input sequence.
>>>>>>>>There's Fast Hartley transform code available for my preferred
>>>>>>>>language.
>>>>>>>>Relative speed is not an issue as I'm working on stored data.
>>>>>>>>
>>>>>>>>What gotcha's should I watch out for?
>>>>>>>>Is the a good comparison on the web of their practical differences?
>>>>>>>>
>>>>>>>>TIA
>>>>>>>
>>>>>>>
>>>>>>>Richard, maybe it's better if you tell us what your intended
>>>>>>>application
>>>>>>>is. Right now it sounds a bit like asking whether you should buy a
>>>>>>>sedan
>>>>>>>or a hatchback :-P. There will be lots of opinions, but they are
>>>>>>>worthless
>>>>>>>depending on your situation and application.
>>>>>>
>>>>>>Well, my app is about as simple as it gets.
>>>>>>I have a time sequence f(t) and I want a measure of abs(f(omega))
>>>>>>Signal so far has been audio, but don't think that bears on question.
>>>>>>I'm not going to filter/process/... the transformed data, just plot it.
>>>>>>I have been using an FFT so far.
>>>>>>I wish to change hardware/OS/etc.
>>>>>>There is existing FHT code that I can cut and paste.
>>>>>>Wikipedia article seemed to imply FHT lost favor to FFT when faster FFT
>>>>>>algorithms were developed.
>>>>>
>>>>>Back when processing power and memory were hard to come by I used to
>>>>>use FHTs a lot. As you mention, they're very good when the input is
>>>>>real-valued, and if you only need the magnitude output for a plot
>>>>>it'll be pretty efficient.
>>>>>
>>>>>Since processing power and memory are cheap these days everybody just
>>>>>plugs in FFTs. When computing or memory resources are hard to come
>>>>>by, though, the FHT can have some real advantages.
>>>>
>>>>I thought that, when using the trick of an N-point FFT to get the
>>>>transform of 2 N-point real sequences, the FHT has no computational
>>>>advantage whatsoever over the FFT? But to answer Richard's question,
>>>>there's nothing wrong or disadvantageous in using it.
>>>
>>>
>>> I think it depends on the details of the resource issues. The FHT is
>>> nice from a memory perspective if the input is real-valued since you
>>> don't have to do any buffer manipulation, whereas with an N/2-pt FFT
>>> you may need to fiddle with the appropriate buffering/addressing to
>>> arrange the input properly. Similar issue with the output.
>>>
>>> The output of the FHT can be problematic if you don't optimise for it,
>>> but, again, if you just want a magnitude spectrum or something it's
>>> pretty easy to pull it out of the native output vector, especially if
>>> a dual-port is used for the output buffer.
>>>
>>> In my experience the real advantage of the FHT was the memory
>>> requirement, which can be substantially smaller than an FFT depending
>>> on the constraints and what you're doing. Since memory is seldom a
>>> constraint these days it's pretty moot.
>>>
>>> Another potential memory saving is in the twidlle factors, since the
>>> FHT has a single reference function [cas()], instead of sin() and
>>> cos(). Clearly it just depends on how you're trying to get it
>>> implemented. These sorts of things only come into play if your'e
>>> really tight on certain resources, but it can make a big difference in
>>> the right circumstances.
>>>
>>> I'm not surprised that the FHT (and other things like it) are largely
>>> unknown since canned FFTs and the like are pretty easy to plug in.
>>>
>>
>> Canned is key feature. It's there for cut and pasting. There is a canned
>> FFT, but it attempts to be all things to all people and the particular FHT
>> was written when resources were scarcer.
>>
>> Thanks folks.
>>
>>
>>>
>>> Eric Jacobsen
>>> Minister of Algorithms
>>> Abineau Communications
>>> http://www.ericjacobsen.org
>
>Resources are always scarce, and of course the faster you do something, the
>more of it you can do! Give your software the edge, especially in audio DSP.
>People should stop thinking like Micro$oft - the "computers are faster, so
>we'll make Windoze slower" mentality! ; )
>
>Despite the arguments on theory, I found FHT to be 50% faster than any FFT
>code on the internet, so I use it. It's probably the cos/sin approximations
>and that it uses half the memory, which helps the data cache and prep time.
>
>Cheers,
>Dave

Yup, if you're careful about how you do things you can get some real
advantages. Some of my FHT experience was on my Amiga in grad school
when I was processing audio sensor data (i.e., real-valued), and coded
the whole thing in 68k assembly with LUTs for the cas() function. I
also had a hand-coded 68k-assembly FFT with LUT twiddles, and just
the buffer handling alone, with extra car paid to handling the input
buffer and forming the ouput magnitude vector provided a substantial
advantage with the FHT.

Operation count alone doesn't tell the whole story, but it's often the
only metric considered.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Steven G. Johnson
10-16-2008, 06:07 PM
On Oct 10, 1:15*pm, Eric Jacobsen <[email protected]> wrote:
> Since processing power and memory are cheap these days everybody just
> plugs in FFTs. * When computing or memory resources are hard to come
> by, though, the FHT can have some real advantages.

That's an old myth, but it's not true.

From a theoretical perspective, the best known FFT algorithms
specialized for real inputs have slightly fewer operations than the
best known FHT algorithms. (Claims that FHTs save a factor of two
over FFTs for real data are bogus, and have been widely debunked in
the literature.)

From a practical perspective, the structure of FHT algorithms is
roughly similar to that of real-data FFT algorithms, so there don't
seem to be any significant implementation advantages one way or the
other. So, it comes down to who has optimized their code more, and
there are far more heavily optimized FFT libraries readily available
than FHT libraries.

Regards,
Steven G. Johnson

Steven G. Johnson
10-16-2008, 06:12 PM
On Oct 10, 3:42*pm, Eric Jacobsen <[email protected]> wrote:
> I think it depends on the details of the resource issues. * The FHT is
> nice from a memory perspective if the input is real-valued since you
> don't have to do any buffer manipulation, whereas with an N/2-pt FFT
> you may need to fiddle with the appropriate buffering/addressing to
> arrange the input properly. * Similar issue with the output.

Nope, there is no intrinsic implementation advantage of an FHT that
I'm aware of; we looked carefully at this issue when working on FFTW.
(Sorensen in 1987 showed that there are real-valued FFT algorithms
that have a very similar structure to FHT algorithms. Of course, all
of this is a bit moot because highly optimized FFT codes these days
look a lot different from the 1960s-1980s radix-2 and split-radix
algorithms.)

> In my experience the real advantage of the FHT was the memory
> requirement, which can be substantially smaller than an FFT depending
> on the constraints and what you're doing. *

Not true. FHTs and real-input FFTs have essentially the same memory
requirements (obviously depending upon the implementation, but there
is no intrinsic advantage one way or the other).

> Another potential memory saving is in the twidlle factors, since the
> FHT has a single reference function [cas()], instead of sin() and
> cos().

Nope, there's no savings here either, because if you count carefully
you'll discover that you need half as many (sin,cos) pairs as cas
functions, so the number of constants balances out.

Steven

Steven G. Johnson
10-16-2008, 06:17 PM
On Oct 13, 10:53*am, "VelociChicken" <[email protected]> wrote:
> Despite the arguments on theory, I found FHT to be 50% faster than any FFT
> code on the internet, so I use it. It's probably the cos/sin approximations
> and that it uses half the memory, which helps the data cache and prep time.

If so, you're using the wrong FFT implementation (or you have an
amazing FHT implementation that you have not yet revealed to the
world). I've benchmarked a lot of FHT implementations, and I'm not
aware of any that comes close to the fastest optimized real-data
FFTs. (Especially since most FHT implemenations are of the textbook
radix-2 variety.)

As I mentioned above, there is negligible theoretical advantage one
way or the other between FHTs and real-data FFTs so far as anyone
knows (the best known real-data FFTs are actually slightly better from
an arithmetic-count perspective), so it comes down to
implementations. And people have spent far more effort optimizing FFT
code than they have in optimizing FHT code. (e.g. every CPU vendor
has an optimized FFT for their chip, usually including a real-data
FFT; I can't think of a single recent general-purpose CPU vendor that
has released an optimized FHT.)

Steven

Eric Jacobsen
10-16-2008, 09:19 PM
On Thu, 16 Oct 2008 10:12:18 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 10, 3:42*pm, Eric Jacobsen <[email protected]> wrote:
>> I think it depends on the details of the resource issues. * The FHT is
>> nice from a memory perspective if the input is real-valued since you
>> don't have to do any buffer manipulation, whereas with an N/2-pt FFT
>> you may need to fiddle with the appropriate buffering/addressing to
>> arrange the input properly. * Similar issue with the output.
>
>Nope, there is no intrinsic implementation advantage of an FHT that
>I'm aware of; we looked carefully at this issue when working on FFTW.
>(Sorensen in 1987 showed that there are real-valued FFT algorithms
>that have a very similar structure to FHT algorithms. Of course, all
>of this is a bit moot because highly optimized FFT codes these days
>look a lot different from the 1960s-1980s radix-2 and split-radix
>algorithms.)
>
>> In my experience the real advantage of the FHT was the memory
>> requirement, which can be substantially smaller than an FFT depending
>> on the constraints and what you're doing. *
>
>Not true. FHTs and real-input FFTs have essentially the same memory
>requirements (obviously depending upon the implementation, but there
>is no intrinsic advantage one way or the other).
>
>> Another potential memory saving is in the twidlle factors, since the
>> FHT has a single reference function [cas()], instead of sin() and
>> cos().
>
>Nope, there's no savings here either, because if you count carefully
>you'll discover that you need half as many (sin,cos) pairs as cas
>functions, so the number of constants balances out.
>
>Steven

Nice. Misses the point, though.

Again, it's a bit more complicated than just counting how much memory
is needed. How it's used can make a big difference, too. Since
the FHT can directly take the real-valued input buffer in the time
order in which it is presented, there can be an advantage depending
(again) on the implementation and what you're trying to do. The same
is true on the output, the same is true in the way that the twiddle
tables are used.

Things like incrementing a pointer into a memory is easier (and
typically faster) than manipulating an address via a computation. That
can make a significant difference and isn't reflected by just
analyzing memory sizes.

It's the usual thing of trying to optimize the tradeoff spaces for
anything...the details may matter. It's no surprise that in some
cases one way or the other (FFT of FHT) may come out ahead compared to
other cases. It's already been acknowledged multiple times in this
thread that the differences are likely to be small, but there are
differences and (surprise) depending on the application one may turn
out to be better than the other.

And optimizing transforms for computation on a CPU is only relevant to
comparisons on that CPU. I'm looking at it far more generically, as
the sorts of applications where the difference may matter may not be
run on a CPU or DSP at all.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

10-16-2008, 10:07 PM
On Oct 16, 4:19*pm, Eric Jacobsen <[email protected]> wrote:
> On Thu, 16 Oct 2008 10:12:18 -0700 (PDT), "Steven G. Johnson"
>
>
>
>
>
> <[email protected]> wrote:
> >On Oct 10, 3:42*pm, Eric Jacobsen <[email protected]> wrote:
> >> I think it depends on the details of the resource issues. * The FHT is
> >> nice from a memory perspective if the input is real-valued since you
> >> don't have to do any buffer manipulation, whereas with an N/2-pt FFT
> >> you may need to fiddle with the appropriate buffering/addressing to
> >> arrange the input properly. * Similar issue with the output.
>
> >Nope, there is no intrinsic implementation advantage of an FHT that
> >I'm aware of; we looked carefully at this issue when working on FFTW.
> >(Sorensen in 1987 showed that there are real-valued FFT algorithms
> >that have a very similar structure to FHT algorithms. *Of course, all
> >of this is a bit moot because highly optimized FFT codes these days
> >look a lot different from the 1960s-1980s radix-2 and split-radix
> >algorithms.)
>
> >> In my experience the real advantage of the FHT was the memory
> >> requirement, which can be substantially smaller than an FFT depending
> >> on the constraints and what you're doing. *
>
> >Not true. *FHTs and real-input FFTs have essentially the same memory
> >requirements (obviously depending upon the implementation, but there
> >is no intrinsic advantage one way or the other).
>
> >> Another potential memory saving is in the twidlle factors, since the
> >> FHT has a single reference function [cas()], instead of sin() and
> >> cos().
>
> >Nope, there's no savings here either, because if you count carefully
> >you'll discover that you need half as many (sin,cos) pairs as cas
> >functions, so the number of constants balances out.
>
> >Steven
>
> Nice. *Misses the point, though.
>
> Again, it's a bit more complicated than just counting how much memory
> is needed. * How it's used can make a big difference, too. * *Since
> the FHT can directly take the real-valued input buffer in the time
> order in which it is presented, there can be an advantage depending
> (again) on the implementation and what you're trying to do. * The same
> is true on the output, the same is true in the way that the twiddle
> tables are used.
>
> Things like incrementing a pointer into a memory is easier (and
> typically faster) than manipulating an address via a computation. That
> can make a significant difference and isn't reflected by just
> analyzing memory sizes.
>
> It's the usual thing of trying to optimize the tradeoff spaces for
> anything...the details may matter. * It's no surprise that in some
> cases one way or the other (FFT of FHT) may come out ahead compared to
> other cases. * It's already been acknowledged multiple times in this
> thread that the differences are likely to be small, but there are
> differences and (surprise) depending on the application one may turn
> out to be better than the other.
>
> And optimizing transforms for computation on a CPU is only relevant to
> comparisons on that CPU. * I'm looking at it far more generically, as
> the sorts of applications where the difference may matter may not be
> run on a CPU or DSP at all.
>
> Eric Jacobsen
> Minister of Algorithms
> Abineau Communicationshttp://www.ericjacobsen.org
>
> Blog:http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php- Hide quoted text -
>
> - Show quoted text -

Hey Eric,

I recall Mot with the 56k series actually implemented a bit reversed
order mode for incrementing pointers, so the data can be read in and
out of the FFT's space directly. Pretty cool!! I used that feature and
it was quite nice.

Clay

Steven G. Johnson
10-17-2008, 12:48 AM
On Oct 16, 4:19*pm, Eric Jacobsen <[email protected]> wrote:
> Again, it's a bit more complicated than just counting how much memory
> is needed. * How it's used can make a big difference, too. * *Since
> the FHT can directly take the real-valued input buffer in the time
> order in which it is presented, there can be an advantage depending
> (again) on the implementation and what you're trying to do. * The same
> is true on the output, the same is true in the way that the twiddle
> tables are used.

If you are implying that the FHT operates on in-order input and
output, while a real-data FFT cannot, then you're wrong.

> Things like incrementing a pointer into a memory is easier (and
> typically faster) than manipulating an address via a computation. That
> can make a significant difference and isn't reflected by just
> analyzing memory sizes.

Of course, but there's no significant difference between FHTs and real-
data FFTs in this regard. (This has been pointed out again and again
in the literature on the subject, dating back to the late 1980s.)

However, since you feel so strongly to the contrary, I look forward to
your pointing out an FHT implementation that beats the best known real-
data FFT implementation by a significant margin for any modern CPU.
I'm sure that such an FHT implementation will see widespread use once
its performance has been demonstrated, and for my own part I try to
keep abreast of the latest developments in the field.

> And optimizing transforms for computation on a CPU is only relevant to
> comparisons on that CPU. * I'm looking at it far more generically, as
> the sorts of applications where the difference may matter may not be
> run on a CPU or DSP at all.

If you're going to distance yourself from readily available computers
where your claims can be tested, there's not much to talk about.

Steven

Eric Jacobsen
10-17-2008, 07:41 PM
On Thu, 16 Oct 2008 16:48:44 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 16, 4:19*pm, Eric Jacobsen <[email protected]> wrote:
>> Again, it's a bit more complicated than just counting how much memory
>> is needed. * How it's used can make a big difference, too. * *Since
>> the FHT can directly take the real-valued input buffer in the time
>> order in which it is presented, there can be an advantage depending
>> (again) on the implementation and what you're trying to do. * The same
>> is true on the output, the same is true in the way that the twiddle
>> tables are used.
>
>If you are implying that the FHT operates on in-order input and
>output, while a real-data FFT cannot, then you're wrong.
>
>> Things like incrementing a pointer into a memory is easier (and
>> typically faster) than manipulating an address via a computation. That
>> can make a significant difference and isn't reflected by just
>> analyzing memory sizes.
>
>Of course, but there's no significant difference between FHTs and real-
>data FFTs in this regard. (This has been pointed out again and again
>in the literature on the subject, dating back to the late 1980s.)
>
>However, since you feel so strongly to the contrary, I look forward to
>your pointing out an FHT implementation that beats the best known real-
>data FFT implementation by a significant margin for any modern CPU.
>I'm sure that such an FHT implementation will see widespread use once
>its performance has been demonstrated, and for my own part I try to
>keep abreast of the latest developments in the field.


Dude, chill. I've no dog in this hunt, I'm merely pointing out that,
just like with most things, there are some subtleties where one may
have an advantage over the other given particular constraints. Is
that so surprising? They're not identical, you know.


>> And optimizing transforms for computation on a CPU is only relevant to
>> comparisons on that CPU. * I'm looking at it far more generically, as
>> the sorts of applications where the difference may matter may not be
>> run on a CPU or DSP at all.
>
>If you're going to distance yourself from readily available computers
>where your claims can be tested, there's not much to talk about.
>
>Steven

And, as has been recognized here multiple times, if one wants to run
on "available computers" and use such canned solutions, then that's
the environment for comparison in that case. In the real world,
however, things often get implemented on various platforms (processors
or in hardware) that fit the application or just what's available at
the time. In those conditions, one plays the tradeoffs to the
resources that they have available, and that may not include canned
optimized solutions or even various ingredients (e.g., FIFOs) that
could make the job easier.

Unless you want to claim that an FFT always provides a superior
implementation opportunity compared to an FHT under all possible
circumstances I don't think there's an argument here. Even if you do
want to claim that my response will just be, "More power to ya." The
differences are small and subtle and I don't think worth the effort in
the majority of the cases. As I stated very early in the thread, for
the vast majority of cases these days memory and processing power are
inexpensive and readily available to the point that just plugging in a
canned FFT (super efficient or not) is clearly the way to go. If one
has to split hairs it's usually more productive to improve
efficiencies elsewhere rather than spend the time to redevelop a
canned block like an FFT, but, nevertheless, sometimes it does get
down to that.

There are times when an obscure tool or technique can provide an
advantage. Being aware of the obscure tools and the differences
allows that to be part of the tradeoff space.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Steven G. Johnson
10-18-2008, 12:18 AM
On Oct 17, 2:41*pm, Eric Jacobsen <[email protected]> wrote:
> Dude, chill. * I've no dog in this hunt, I'm merely pointing out that,
> just like with most things, there are some subtleties where one may
> have an advantage over the other given particular constraints. * Is
> that so surprising? * They're not identical, you know.

You initially claimed "some real advantages" for the FHT in terms of
"computing or memory resources," including "substantially smaller"
memory requirements. Whenever you were pressed for specifics about
these "subtleties," however, your examples turned out to be false.
False statements bother me, especially when you are repeating myths
that were debunked 20 years ago.

> Unless you want to claim that an FFT always provides a superior
> implementation opportunity compared to an FHT under all possible
> circumstances I don't think there's an argument here.

I made no such claim. I stated repeatedly that they are (as far as is
currently known) almost identical with respect to memory usage
(including access patterns), arithmetic, and organization/structure.
Old claims that one has a substantial performance advantage over the
other have been repeatedly disproved in the literature.

> There are times when an obscure tool or technique can provide an
> advantage. * Being aware of the obscure tools and the differences
> allows that to be part of the tradeoff space. *

Repeating long-debunked myths about obscure tools does not help people
to understand them better.

The DHT certainly has some interesting properties, and is a nice
transform to think about from time to time. (For example, in one of
our own papers we showed that DHTs give a simple way to apply Rader's
algorithm for prime-length transforms to purely real data, although
Burrus in 1982 showed that there is a way to do this directly for real-
data FFTs too.) But early claims of significant performance or memory
advantages from FHT algorithms for power-of-two sizes have, sadly, not
withstood the test of time.

Regards,
Steven G. Johnson

VelociChicken
10-18-2008, 12:46 AM
>The DHT certainly has some interesting properties, and is a nice
>transform to think about from time to time. (For example, in one of
>our own papers we showed that DHTs give a simple way to apply Rader's
>algorithm for prime-length transforms to purely real data, although
>Burrus in 1982 showed that there is a way to do this directly for real-
>data FFTs too.) But early claims of significant performance or memory
>advantages from FHT algorithms for power-of-two sizes have, sadly, not
>withstood the test of time.

Use Ron Mayer's algorithm from the early 1990's, the one with selectable
trig tables and test it yourself. I've found that large FHT's are even more
of an advantage, compare maybe the best FFT and his FHT with 16384 data
points. Perhaps the saving comes out through thrashing the memory in the
convolutions, and maybe the trig tables are not accurate enough for your
needs.
I know that doesn't sound very scientific, but trying everything out is the
only way you're going to know what to use....

Eric Jacobsen
10-18-2008, 01:21 AM
On Fri, 17 Oct 2008 16:18:20 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 17, 2:41*pm, Eric Jacobsen <[email protected]> wrote:
>> Dude, chill. * I've no dog in this hunt, I'm merely pointing out that,
>> just like with most things, there are some subtleties where one may
>> have an advantage over the other given particular constraints. * Is
>> that so surprising? * They're not identical, you know.
>
>You initially claimed "some real advantages" for the FHT in terms of
>"computing or memory resources," including "substantially smaller"
>memory requirements. Whenever you were pressed for specifics about
>these "subtleties," however, your examples turned out to be false.
>False statements bother me, especially when you are repeating myths
>that were debunked 20 years ago.

Actually, the issue is that I understand that such comparisons are
extremely context specific. So unless someone has done the
comparisons for a specific context at hand, it's all pretty much just
handwaving. You keep citing studies, but unless a study compared the
issues in the context of the specific resource limitations in
computing and memory of interest in a particular application, it's
just academic.

I understand this. You seem to be arguing that because a bunch of
papers came to some conclusion that that conclusion must be true for
everyone, everywhere. I disagree with that standpoint, and I've
tried to explain why as well as I can. I don't know why you think
this it is so important to promote FFTs as being better than or equal
to FHTs. I'm just saying there are tradeoffs, as is typically the
case with just about everything, anyway.

>> Unless you want to claim that an FFT always provides a superior
>> implementation opportunity compared to an FHT under all possible
>> circumstances I don't think there's an argument here.
>
>I made no such claim.

I know that. That's why I started the statement with "Unless you
want to claim that...". If you're not making that claim then I
really don't understand the cause for concern.

> I stated repeatedly that they are (as far as is
>currently known) almost identical with respect to memory usage
>(including access patterns), arithmetic, and organization/structure.
>Old claims that one has a substantial performance advantage over the
>other have been repeatedly disproved in the literature.

Repeatedly disproved in the literature for what context? Do you
understand that someone implementing a transform in an FPGA or a
silicon target is constrained by the memory architectures and
processing elements available in the libraries? Sometimes those
elements lend themselves well to a particular implementation and not
to others. Sometimes, especially when power and real estate are a
concern, minimizing certain elements can be particularly advantageous.
Do you claim that all of these studies that have "repeatedly disproved
in the literature" that the FHT can sometimes make sense considered
ALL of the various permutations of practical resource limitations?

That's some set of magical studies if that's the case.

>> There are times when an obscure tool or technique can provide an
>> advantage. * Being aware of the obscure tools and the differences
>> allows that to be part of the tradeoff space. *
>
>Repeating long-debunked myths about obscure tools does not help people
>to understand them better.

Repeatedly citing "the literature" as the final authority for every
case, everywhere is a poor substitute for actually being able to
assess tradeoffs for oneself.

>The DHT certainly has some interesting properties, and is a nice
>transform to think about from time to time. (For example, in one of
>our own papers we showed that DHTs give a simple way to apply Rader's
>algorithm for prime-length transforms to purely real data, although
>Burrus in 1982 showed that there is a way to do this directly for real-
>data FFTs too.) But early claims of significant performance or memory
>advantages from FHT algorithms for power-of-two sizes have, sadly, not
>withstood the test of time.
>
>Regards,
>Steven G. Johnson

I recall some of the claims in the early 80s of the vast superiority
of the FHT for real-valued input transforms. It was clear to me from
the first time I looked into it (around that time) that the
differences would not be huge and would be application and
implementation dependent. That's the story I've been telling here,
and I really don't understand why you think that's a problematic
position (and I think it's still accurate).

Given that the two transforms are different, it would seem pretty
logical that the differences would lead to advantages one way or other
depending on different circumstances.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Steven G. Johnson
10-18-2008, 08:03 PM
On Oct 17, 7:46*pm, "VelociChicken" <[email protected]> wrote:
> Use Ron Mayer's algorithm from the early 1990's, the one with selectable
> trig tables and test it yourself. I've found that large FHT's are even more
> of an advantage, compare maybe the best FFT and his FHT with 16384 data
> points. *

I have tested it (e.g. http://www.fftw.org/speed/Pentium4-3.60GHz-icc/).
It's about 4-5 times slower than the fastest real-data FFT these days
(including half-a-dozen open-source real-data FFTs that beat it).

Steven

Steven G. Johnson
10-18-2008, 08:24 PM
On Oct 17, 8:21*pm, Eric Jacobsen <[email protected]> wrote:
> I understand this. *You seem to be arguing that because a bunch of
> papers came to some conclusion that that conclusion must be true for
> everyone, everywhere.

No, I'm arguing that your claims of "substantial" advantages have no
support, and your examples of supposed advantages (lower memory for
trig tables, more sequential access, reduced memory reordering) were
demonstrably false. It is impossible to prove a negative, but since
you are the one claiming substantial advantages, the burden of proof
is on you, especially since people have tried and failed to
convincingly demonstrate such advantages for 20 years.

> Given that the two transforms are different, it would seem pretty
> logical that the differences would lead to advantages one way or other
> depending on different circumstances.

If you actually look in detail at the algorithms, they are in fact
extremely similar. This is why people who have studied the problem in
detail have concluded that there don't seem to be significant
advantages one way or the other. You have yet to correctly identify
one significant difference.

I would be very happy to learn of a way to use the DHT to gain a
substantial advantage over real-data FFTs. FFTs of real data are
annoying asymmetric transforms and the diversity of data formats for
their redundant complex outputs are a PITA. But unsupported claims of
substantial computational or memory savings, which have proved false
again and again for two decades, are not helpful.

Regards,
Steven G. Johnson

VelociChicken
10-18-2008, 10:49 PM
>"Steven G. Johnson" <[email protected]> wrote in message
>news:[email protected]...
>On Oct 17, 7:46 pm, "VelociChicken" <[email protected]> wrote:
>> Use Ron Mayer's algorithm from the early 1990's, the one with selectable
>> trig tables and test it yourself. I've found that large FHT's are even
>> more
>> of an advantage, compare maybe the best FFT and his FHT with 16384 data
>> points.
>
>I have tested it (e.g. http://www.fftw.org/speed/Pentium4-3.60GHz-icc/).
>It's about 4-5 times slower than the fastest real-data FFT these days
>(including half-a-dozen open-source real-data FFTs that beat it).
>
>Steven

Thanks.
I take it that's the Mayer FHT not the FFT tested there. Anyway, can't
refute that, and I'll look through them again at some point. It may well
have been the combination with my convolution code or some trig settings on
the other FFT's. Kinda weird though...

Steven G. Johnson
10-19-2008, 12:15 AM
On Oct 18, 5:49*pm, "VelociChicken" <[email protected]> wrote:
> I take it that's the Mayer FHT not the FFT tested there. Anyway, can't
> refute that, and I'll look through them again at some point. It may well
> have been the combination with my convolution code or some trig settings on
> the other FFT's. Kinda weird though...

The "rmayer" code in the graphs is the real-data FFT computed via the
FHT (Mayer's reallfft routine). That involves one additional pass
over the array in Mayer's code compared to his FHT by itself, but the
difference is tiny. If I benchmark his FHT by itself, it's slightly
faster (e.g. 3% faster for n=16384 and 7% faster for n=512 on my
current machine) due to omitting that one pass, but that's not nearly
enough to make up for the factor of 4 difference in speed compared to
the other codes in the benchmark.

There's nothing wrong with Ron Mayer's code; it's a perfectly
reasonable, decently optimized, clean implementation of a power-of-two
FHT, and is competitive with a number of available real-input FFT
routines; it's comparable to a radix-4 or radix-8 FFT. It's certainly
better than your run-of-the-mill Numerical-Recipes radix-2 FFT code.
But it doesn't exhibit any clear advantage over similarly optimized
real-input FFTs, and can't compete with much more heavily optimized
implementations of the latter.

Steven

Eric Jacobsen
10-19-2008, 07:07 AM
On Sat, 18 Oct 2008 12:24:22 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 17, 8:21*pm, Eric Jacobsen <[email protected]> wrote:
>> I understand this. *You seem to be arguing that because a bunch of
>> papers came to some conclusion that that conclusion must be true for
>> everyone, everywhere.
>
>No, I'm arguing that your claims of "substantial" advantages have no
>support, and your examples of supposed advantages (lower memory for
>trig tables, more sequential access, reduced memory reordering) were
>demonstrably false. It is impossible to prove a negative, but since
>you are the one claiming substantial advantages, the burden of proof
>is on you, especially since people have tried and failed to
>convincingly demonstrate such advantages for 20 years.
>
>> Given that the two transforms are different, it would seem pretty
>> logical that the differences would lead to advantages one way or other
>> depending on different circumstances.
>
>If you actually look in detail at the algorithms, they are in fact
>extremely similar. This is why people who have studied the problem in
>detail have concluded that there don't seem to be significant
>advantages one way or the other. You have yet to correctly identify
>one significant difference.
>
>I would be very happy to learn of a way to use the DHT to gain a
>substantial advantage over real-data FFTs. FFTs of real data are
>annoying asymmetric transforms and the diversity of data formats for
>their redundant complex outputs are a PITA. But unsupported claims of
>substantial computational or memory savings, which have proved false
>again and again for two decades, are not helpful.
>
>Regards,
>Steven G. Johnson

'Substantial' is in the eye of the beholder, and the context matters,
as I've tried to repeatedly explain. I'm not interested in going
down a road that seems to me to be clearly headed to the usual
semantic nit-picking.

I don't feel up to holding your hand or convincing you, since you seem
already convinced of the infallibility of your point of view. As I
said, I don't really have a dog in this hunt and I've not felt the
need to convince anyone of anything. I was merely offering a point of
view based on my personal experience. As usual, this being teh
intartubes and all, all opinions are offered and taken FWTW. Clearly
you're already comfortable with being an expert on this topic. I
welcome you to your own contentedness in that realm.

I will also say that I've given quite a few hints as to where one can
find certain advantages and under what conditions. If you don't find
yourself in those conditions, you may continue to not find an
advantage. As usual, YMMV.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

VelociChicken
10-19-2008, 03:41 PM
"Steven G. Johnson" <[email protected]> wrote in message
news:[email protected]...
On Oct 10, 1:15 pm, Eric Jacobsen <[email protected]> wrote:
> Since processing power and memory are cheap these days everybody just
> plugs in FFTs. When computing or memory resources are hard to come
> by, though, the FHT can have some real advantages.
>That's an old myth, but it's not true.
>From a theoretical perspective, the best known FFT algorithms
>specialized for real inputs have slightly fewer operations than the
>best known FHT algorithms. (Claims that FHTs save a factor of two
>over FFTs for real data are bogus, and have been widely debunked in
>the literature.)

Your FFTW vested interests are starting to show through.

>From a practical perspective, the structure of FHT algorithms is
>roughly similar to that of real-data FFT algorithms, so there don't
>seem to be any significant implementation advantages one way or the
>other. So, it comes down to who has optimized their code more, and
>there are far more heavily optimized FFT libraries readily available
>than FHT libraries.

Well, seeing as the FHT uses half the memory of FFT, then I guess it's about
time someone did a heavily optimised FHT don't you think? Although I suspect
people have, and it's not likely to get simply posted on the interweb is it.

Steven G. Johnson
10-19-2008, 04:02 PM
On Oct 19, 10:41*am, "VelociChicken" <[email protected]> wrote:
> Your FFTW vested interests are starting to show through.

I'm not sure why you think FFTW gives me a vested interest on way or
the other on this. We would be perfectly happy to use an FHT in FFTW
if it had any clear advantage. The DHT is *already* one of the
transforms supported by the FFTW API. In fact, we do use a DHT to do
real-data transforms of prime sizes, although I later found out that
there is also a way to do this directly using a real-data FFT
(published by Burrus in the 1980s).

The problem is that proponents of FHTs on this thread have been
repeatedly making claims that are either false on their face, or
completely unsupported by any evidence and contradicted by 20 years of
attempts.

> Well, seeing as the FHT uses half the memory of FFT, then I guess it's about
> time someone did a heavily optimised FHT don't you think?

This is utterly false, although it is a persistent myth. An FFT
algorithm specialized for real inputs has the same memory requirements
as an FHT.

Regards,
Steven G. Johnson

Steven G. Johnson
10-19-2008, 04:03 PM
On Oct 19, 2:07*am, Eric Jacobsen <[email protected]> wrote:
> On Sat, 18 Oct 2008 12:24:22 -0700 (PDT), "Steven G. Johnson"
>
>
>
> <[email protected]> wrote:
> >On Oct 17, 8:21*pm, Eric Jacobsen <[email protected]> wrote:
> >> I understand this. *You seem to be arguing that because a bunch of
> >> papers came to some conclusion that that conclusion must be true for
> >> everyone, everywhere.
>
> >No, I'm arguing that your claims of "substantial" advantages have no
> >support, and your examples of supposed advantages (lower memory for
> >trig tables, more sequential access, reduced memory reordering) were
> >demonstrably false. *It is impossible to prove a negative, but since
> >you are the one claiming substantial advantages, the burden of proof
> >is on you, especially since people have tried and failed to
> >convincingly demonstrate such advantages for 20 years.
>
> >> Given that the two transforms are different, it would seem pretty
> >> logical that the differences would lead to advantages one way or other
> >> depending on different circumstances.
>
> >If you actually look in detail at the algorithms, they are in fact
> >extremely similar. This is why people who have studied the problem in
> >detail have concluded that there don't seem to be significant
> >advantages one way or the other. *You have yet to correctly identify
> >one significant difference.
>
> >I would be very happy to learn of a way to use the DHT to gain a
> >substantial advantage over real-data FFTs. *FFTs of real data are
> >annoying asymmetric transforms and the diversity of data formats for
> >their redundant complex outputs are a PITA. *But unsupported claims of
> >substantial computational or memory savings, which have proved false
> >again and again for two decades, are not helpful.
>
> >Regards,
> >Steven G. Johnson
>
> 'Substantial' is in the eye of the beholder, and the context matters,
> as I've tried to repeatedly explain. * I'm not interested in going
> down a road that seems to me to be clearly headed to the usual
> semantic nit-picking.
>
> I don't feel up to holding your hand or convincing you, since you seem
> already convinced of the infallibility of your point of view. *As I
> said, I don't really have a dog in this hunt and I've not felt the
> need to convince anyone of anything. *I was merely offering a point of
> view based on my personal experience. * As usual, this being teh
> intartubes and all, all opinions are offered and taken FWTW. *Clearly
> you're already comfortable with being an expert on this topic. *I
> welcome you to your own contentedness in that realm.
>
> I will also say that I've given quite a few hints as to where one can
> find certain advantages and under what conditions. * If you don't find
> yourself in those conditions, you may continue to not find an
> advantage. * As usual, YMMV.
>
> Eric Jacobsen
> Minister of Algorithms
> Abineau Communicationshttp://www.ericjacobsen.org
>
> Blog:http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Steven G. Johnson
10-19-2008, 04:14 PM
On Oct 19, 2:07*am, Eric Jacobsen <[email protected]> wrote:
> 'Substantial' is in the eye of the beholder, and the context matters,
> as I've tried to repeatedly explain. * I'm not interested in going
> down a road that seems to me to be clearly headed to the usual
> semantic nit-picking.

In other words, you're not up to giving any actual evidence for your
claim; it sounds like this thread is at an end, since you've resorted
to innuendo and ad hominem attacks in place of rational arguments.

> I will also say that I've given quite a few hints as to where one can
> find certain advantages and under what conditions. *

The problem is that all your hints were demonstrably incorrect. You
claimed that FHTs require half the size of the trig tables, which I
explained is false. You claimed that FHTs have simpler or more
regular memory access patterns, or that real-data FFTs require
additional reorderings of the array, which was shown to be false 20
years ago by Sorensen (he presented real-data FFT algorithms that have
the same sort of memory access pattern as comparable FHT, and require
the same number of passes/reorderings).

Steven

Andrew Reilly
10-19-2008, 11:05 PM
On Sun, 19 Oct 2008 15:41:35 +0100, VelociChicken wrote:
> Well, seeing as the FHT uses half the memory of FFT, then I guess it's
> about time someone did a heavily optimised FHT don't you think?

I'm pretty sure that it's statements like this one that are keeping
Steven on the case.

I don't believe that it has *ever* been true, and short of doing
something completely lame like using a redundant, full-length complex FFT
to perform the transform, I can't imagine a circumstance where it would
hold. Can you elaborate a little on this mysterious factor of two memory
use?

--
Andrew

Jerry Avins
10-19-2008, 11:19 PM
Andrew Reilly wrote:

...

> Can you elaborate a little on this mysterious factor of two memory use?

Legend? Myth? The Goebels technique?

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

VelociChicken
10-20-2008, 02:34 PM
"Jerry Avins" <[email protected]> wrote in message
news:[email protected]...
> Andrew Reilly wrote:
>
> ...
>
>> Can you elaborate a little on this mysterious factor of two memory use?
>
> Legend? Myth? The Goebels technique?

There's no 'complex' array to be set up.

Steven G. Johnson
10-20-2008, 07:46 PM
On Oct 20, 9:34*am, "VelociChicken" <[email protected]> wrote:
> >> Can you elaborate a little on this mysterious factor of two memory use?
>
> There's no 'complex' array to be set up.

The same savings can be obtained for an FFT algorithm specialized for
real inputs.

Yes, if you have an FFT subroutine that takes n complex inputs and
returns n complex outputs, then you would need to take your real
inputs and convert them into a complex array with zero imaginary
parts, requiring twice the storage. However, it's been observed since
(I think?) the late 1960's that this factor of 2 is not necessary.

Instead, there are a number of FFT algorithms specialized for real
data that take a real array as input, and return half of a complex
array as output (requiring the same storage as the real array, and
possibly operating in-place). The reason you can do this is that the
discrete Fourier transform of real inputs is half redundant: the
second half is the complex conjugate of the first half, in reverse
order.

(Probably the most well-known such technique, although not generally
the most efficient, is a trick to express a real-input DFT of length n
in terms of a complex-input FFT of length n/2 plus some post-
processing. An alternative technique is to start with a complex-data
FFT algorithm, and go through the algorithm and prune out the
redundant operations...again, you can save a factor of 2 in memory and
slightly more than a factor of 2 in computation compared to the
complex-input algorithm.)

The existence of these real-input FFT algorithms is why all claims of
factor-of-two savings from FHT algorithms were (at best!) very
misleading, as was pointed out shortly after FHTs was introduced.
Inexplicably, this mythical factor of two has persisted for an
amazingly long time in the imaginations of engineers (and some
unfortunate authors).

Regards,
Steven G. Johnson

VelociChicken
10-20-2008, 08:47 PM
>The same savings can be obtained for an FFT algorithm specialized for
>real inputs.
>
>Yes, if you have an FFT subroutine that takes n complex inputs and
>returns n complex outputs, then you would need to take your real
>inputs and convert them into a complex array with zero imaginary
>parts, requiring twice the storage. However, it's been observed since
>(I think?) the late 1960's that this factor of 2 is not necessary.
>
>Instead, there are a number of FFT algorithms specialized for real
>data that take a real array as input, and return half of a complex
>array as output (requiring the same storage as the real array, and
>possibly operating in-place). The reason you can do this is that the
>discrete Fourier transform of real inputs is half redundant: the
>second half is the complex conjugate of the first half, in reverse
>order.
>
>(Probably the most well-known such technique, although not generally
>the most efficient, is a trick to express a real-input DFT of length n
>in terms of a complex-input FFT of length n/2 plus some post-
>processing. An alternative technique is to start with a complex-data
>FFT algorithm, and go through the algorithm and prune out the
>redundant operations...again, you can save a factor of 2 in memory and
>slightly more than a factor of 2 in computation compared to the
>complex-input algorithm.)
>
>The existence of these real-input FFT algorithms is why all claims of
>factor-of-two savings from FHT algorithms were (at best!) very
>misleading, as was pointed out shortly after FHTs was introduced.
>Inexplicably, this mythical factor of two has persisted for an
>amazingly long time in the imaginations of engineers (and some
>unfortunate authors).
>
>Regards,
>Steven G. Johnson

Thank-you for your verbose and interesting reply. Is it possible to tell me
which would actually be the best (fastest, and free) real-FFT that I can
test myself? I'm using Intel CPUs.
Thanks again,
Dave

Eric Jacobsen
10-20-2008, 10:21 PM
On Sun, 19 Oct 2008 08:14:36 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 19, 2:07*am, Eric Jacobsen <[email protected]> wrote:
>> 'Substantial' is in the eye of the beholder, and the context matters,
>> as I've tried to repeatedly explain. * I'm not interested in going
>> down a road that seems to me to be clearly headed to the usual
>> semantic nit-picking.
>
>In other words, you're not up to giving any actual evidence for your
>claim; it sounds like this thread is at an end, since you've resorted
>to innuendo and ad hominem attacks in place of rational arguments.

W T F?

What innuendo and ad hominem?

You seem to be taking this pretty personally. Why is it so important
to you to make FHTs look inferior to FFTs? As I mentioned a couple
of times now, but which you have not addressed, unless you want to
claim that the FFT is always, in all circumstances, superior to the
FHT (and I still don't know whether that's your position or not),
there really isn't any argument here.

I am absolutely not claiming that the FHT has any sort of general
"superiority" over the FFT and I've tried to be clear about that.
Since the transforms are different, and I hope you understand that
they are, those differences can be exploited in certain circumstances
to provide a benefit. Again, as I've mentioned previously, that's to
be expected. Where there are differences one can exploit them to
one's advantage. That's what engineers do.

Assuming, of course, they're aware of the various tools and their
differences.

>> I will also say that I've given quite a few hints as to where one can
>> find certain advantages and under what conditions. *
>
>The problem is that all your hints were demonstrably incorrect.

I disagree.

>You
>claimed that FHTs require half the size of the trig tables, which I
>explained is false.

Where did I claim that? I don't believe that to be true, so I
wouldn't have claimed it to be so. I did say,

"Another potential memory saving is in the twidlle factors, since the
FHT has a single reference function [cas()], instead of sin() and
cos(). Clearly it just depends on how you're trying to get it
implemented. These sorts of things only come into play if your'e
really tight on certain resources, but it can make a big difference in
the right circumstances."

That does NOT imply that using cas() as a basis function takes half
the memory of cos() and sin(). I think even the least experienced
among us knows that if you have either of a sin() or cos() LUT, you
have the other as well. And either sin() or cos() can be folded so
that much less than a full wave needs to be stored, and you can do
that with cas() as well. How those memories are accessed in a
transform is different between the FFT and FHT, though, and that can
matter.

The last quoted sentence above should have made it clear that I'm not
claiming there's a large difference (certainly not a factor of two!),
and that it takes the "right circumstances" to get a benefit. Do you
really think that there's never a difference? I'll try to illuminate
a bit more below, but the gist of what I was trying to say was that
there are certain architectures where the single real-valued basis
function of the HT can be exploited to reduce complexity.

FWIW, I've found several (recent) references where people are
currently building generalized transform hardware based on FHT engines
(kind of like Mayer did with his software). Since a generalized
compression engine may need to do FFTs, FCTs, and FSTs, there are
evidently advantages to just building a single FHT to do all of those
tasks. Here's one reference:

http://ieeexplore.ieee.org/Xplore/defdeny.jsp?url=/iel5/4061132/4061133/04061261.pdf?isnumber=4061133&prod=CNF&tp=x&arnumber=4061261&arSt=794&ared=795&arAuthor=P.+John+Paul%3B+P.N.+Girija&htry=16&code=21

For the constraints and optimizations in this case (and several are
actually shown), the author found using an FHT as the basis
advantageous.

That's really just a single example of the only small point I've had,
to which you seem to take extreme exception, that there do, in fact,
exist cases where the FHT can have an advantage. I've readily stated
that the advantage is usually small and only under certain
constraints, but there can be an advantage. Why you find that so
objectionable seems very curious.

> You claimed that FHTs have simpler or more
>regular memory access patterns, or that real-data FFTs require
>additional reorderings of the array, which was shown to be false 20
>years ago by Sorensen (he presented real-data FFT algorithms that have
>the same sort of memory access pattern as comparable FHT, and require
>the same number of passes/reorderings).

I didn't claim that in the generic sense that you seem to be implying.
I think the problem here is that you don't understand my arguments. I
suspect your perspective is from the case of just performing
transforms in software via an HLL on a generalized computing platform,
and from my perspective that's the LEAST interesting case. So your
anxiety is perhaps from interpreting my statements from that
perspective, which could lead to some misconceptions since I'm looking
more broadly.

Since this seems to be a Big Thing to you, I'll try to help illuminate
it a bit better. Perhaps your horizons will be broadened, perhaps
not, so take it FWIW. But please do understand that sometimes people
try to do transforms via means other than a C or other HLL program
running on a processor somewhere. I tried previously to explain
that, but since it didn't seem to make it from here all the way to
there I'll try to be a bit more explicit.

There are a few scenarios that I think are relevant and exemplary:

1) An all-hardware implementation, i.e., just hooking gates and things
together to build a transformer. There'll likely be input and output
buffers, perhaps additional processing buffers, computation logic,
control logic, address generation logic, etc., etc. To generalize a
little bit this case can include silicon targets (i.e., to be fabbed
on a chip), or FPGA targets, or similar connections of primitive
building blocks.

2) A hybrid platform where various processing, memory and logic
elements are combined. This could be packaged memory with something
like an ALU and control logic. The main difference to case one could
be considered to be the granularity of the primitives, e.g., a
packaged memory rather than a more flexible block, an ALU rather than
parameterized adders and multipliers.

3) A small embedded processor, perhaps with some limited, but
inflexible, memory and limited computing ability.

4) An embedded DSP.

5) A generalized computing platform (e.g., a PC).

Can you begin to see that each of these cases is quite different and
optimized solutions for each might look quite different from each
other? If your claim is that what works in case 5) should work in
all of the above cases then I think you're the one who needs to do
some pretty significant substantiating of claims. I don't even
necessarily buy the idea that the FFT is always superior in all
situations for case 5), especially when processing and/or interface
memory constraints may be involved, but I think the point there is way
too fine to argue either way.

I don't think it practical to fully analyze tradeoffss applicable to
all of the cases described, though (and that's been part of my point)
since the details of the constraints even in just case 1) alone may
lead to a very different approach just based on those details. The
availability of memory primitives can make a big difference; if, as
is often the case, one can get a dual port for the same resource and
cost utilization as a single port, that changes the game. If one
wants to optimize power consumption rather than area, that can change
the game. If one does not have multipliers available (as is still
the case in many FPGAs) then that can change the game.

Since the FHT does, in fact, have differences between it and the FFT,
such details may be significant to an implementer given certain
constraints that can exist. If memory is tight and only chunks of
certain sizes are available (and one is limited in how much area can
be used for memory), then damn straight tootin' an FHT can have an
advantage. Moving data in and out of limited memory efficiently can
be very different between the two, and how many instances can be
populated, how many ports, and how much hardware can be dedicated to
address generation can matter. If pipelining is required for repeated
transforms and multipliers are at a premium, then an FHT may have an
advantage since the first two stages don't require any multiplication.
At a former employer back in ancient times when our FFT engines were
12"x12" circuit boards built with through-hole parts, the transform
algorithms that use only FIFO memories were attractive in order to
reduce the control and address generation hardware requirements. Those
sorts of tradeoffs change substantially every time the part library
changes.

I can go on and on, but I hope you're starting to get the drift. For
people who actually have to build things, being constrained by the
availability and details of the building blocks makes one look much
further than just academic papers and software library comparisons.
All that being said, I don't think I've personally plugged in an FHT
anywhere serious in at least fifteen years. As I said in my very
first post in this thread, memory and processing is cheap enough these
days that saving development time by plugging in a canned FFT is
almost always the way to go. Evidently the difference between you
and me is that I recognize that there are still those odd corner cases
once in a while where the tradeoffs can go the other way.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Andrew Reilly
10-21-2008, 01:04 AM
On Mon, 20 Oct 2008 14:21:09 -0700, Eric Jacobsen wrote:

> I can go on and on, but I hope you're starting to get the drift. For
> people who actually have to build things, being constrained by the
> availability and details of the building blocks makes one look much
> further than just academic papers and software library comparisons. All
> that being said, I don't think I've personally plugged in an FHT
> anywhere serious in at least fifteen years. As I said in my very
> first post in this thread, memory and processing is cheap enough these
> days that saving development time by plugging in a canned FFT is almost
> always the way to go. Evidently the difference between you and me is
> that I recognize that there are still those odd corner cases once in a
> while where the tradeoffs can go the other way.

I'm afraid that you still haven't made the case, at least to me. In
fact, a couple of your comments, specifically about wasted space,
irregularity of access pattern (i.e., address generator complexity) and
the bit about no multiplications in the first two passes makes me think
that you haven't looked at an FFT algorithm in about as long as your FHT
experience. I'm not in Steven's league, but none of the FFT code that
I've written in the last several years has had much other than linear
memory addressing, and hasn't used multiplies in the first two passes.
The first FFT machine that I encountered was indeed a VLSI device (the
Austek A41102.) It achieved it's good performance for the day by
streaming data in and out of DRAM chips with sequences of column-access
cycles. That was pretty quickly overtaken by Motorola DSP chips, though.

I suspect that a paraphrase of Steven's point (and, indeed my own
interest) is not to say that there's anything *necessarily* worse about
the FHT than the FFT (apart from the point that I make below), but that
in any given instance, it would seem likely that a technique that makes
the FHT faster could *also* be used to make the FFT faster. That's why
he's keen to find particular examples and insights of such corner cases.

Why are we computing a transform in the first place? Do we want to
analyse the power spectrum, or do we want to perform convolution
filtering, or some other purpose? It's been a while since I looked, but
my memory of the FHT has it requiring a bunch of additional sum/
difference operations on X_k and X_{N-k}, in order to perform either of
those operations. That's not a particularly attractive addressing
sequence, and a bunch of additional memory operations, to my eye.

Cheers,

--
Andrew

Steven G. Johnson
10-21-2008, 02:10 AM
On Oct 20, 5:21*pm, Eric Jacobsen <[email protected]> wrote:
> Why is it so important to you to make FHTs look inferior to FFTs? *

Straw man.

I never said FHTs were inferior. I said that the two are nearly
identical in every measurable respect, so far as is currently known---
no one has demonstrated any significant intrinsic advantage one way or
the other in terms of performance or memory utilization.

You, on the other hand, claimed "substantial" benefits in
"computational resources" for the FHT in some circumstances for
unexplained reasons, including "substantial" memory savings.

> have the other as well. * And either sin() or cos() can be folded so
> that much less than a full wave needs to be stored, and you can do
> that with cas() as well. *

No you can't do it with cas as well. That was my point: if you look
at the trig table of sin and cos values, they have twice as much
redundancy as a table of cas values (which has less symmetry), and
because of that the number of trigonometric constants is essentially
the same for an FHT or FFT.

>*Do you really think that there's never a difference?

I think that no one has demonstrated any substantial difference; the
only clear differences that have been demonstrated slightly favor FFTs
(but only slightly....a few flops saved is irrelevant in practice).

You were claiming "substantial" benefits in performance in memory, and
I was pointing out that, over the last 20 years, despite strenuous
efforts, there is zero evidence to support your claim in any known
circumstances, and plenty of evidence that the intrinsic differences
between the algorithms are negligible (and swamped by differences in
skill of implementors).

> FWIW, I've found several (recent) references where people are
> currently building generalized transform hardware based on FHT engines
> (kind of like Mayer did with his software). * Since a generalized
> compression engine may need to do FFTs, FCTs, and FSTs, there are
> evidently advantages to just building a single FHT to do all of those
> tasks. * Here's one reference:
>
> http://ieeexplore.ieee.org/Xplore/defdeny.jsp?url=/iel5/4061132/40611....

(You can do all of those functions using a real-data FFT, too.)

That paper doesn't actually try doing a real-data FFT and comparing it
to an FHT. In fact, the papers it cites as explanation for why they
chose to do an FHT are actually papers explaining just the opposite,
that real-data FFTs are just as efficient as FHTs and have similar
structure.

DHTs certainly have conceptual attractions. And the fact that the
same routine is its own inverse is sometimes convenient (the one
concrete reason cited by the paper authors for their choice).

If you had said, "I find the DHT conceptually nicer, and the fact that
it is its own inverse leads to some savings of code/gates in some
applications," I wouldn't have any objection. What bothered me is
your claiming "substantial" benefits in performance and that the
"memory requirement is substantially smaller", both of which claims
are contradicted by 20 years of work.

> For the constraints and optimizations in this case (and several are
> actually shown), the author found using an FHT as the basis
> advantageous.

None are shown; are we reading the same 2-page conference paper? The
timing numbers they report for FFT etc are worse than those for FHT,
but that's because they implemented the other transforms using an FHT
(hence with additional overhead). They didn't actually try comparing
to any intrinsic real-data FFT algorithm, so that paper really isn't
evidence one way or the other.

> 1) An all-hardware implementation, i.e., just hooking gates and things
> together to build a transformer. *

And what, precisely, do you see as the substantial "performance" and
"memory" advantages of an FHT in this circumstance?

> 2) A hybrid platform where various processing, memory and logic
> elements are combined. *

And what, precisely, do you see as the substantial "performance" and
"memory" advantages of an FHT in this circumstance?

> 3) A small embedded processor, perhaps with some limited, but
> inflexible, memory and limited computing ability.

And what, precisely, do you see as the substantial "performance" and
"memory" advantages of an FHT in this circumstance?

> 4) An embedded DSP.

And what, precisely, do you see as the substantial "performance" and
"memory" advantages of an FHT in this circumstance?

> 5) A generalized computing platform (e.g., a PC).

And what, precisely, do you see as the substantial "performance" and
"memory" advantages of an FHT in this circumstance?

> Can you begin to see that each of these cases is quite different and
> optimized solutions for each might look quite different from each
> other? *

Of course they are different, but you still have yet to (correctly)
explain a clear concrete advantage for FHTs in performance or memory
in *any* circumstance.

> If memory is tight and only chunks of
> certain sizes are available (and one is limited in how much area can
> be used for memory), then damn straight tootin' an FHT can have an
> advantage.

Why do you think that an FHT requires smaller chunks of data than a
real-data FFT?

> address generation can matter. *If pipelining is required for repeated
> transforms and multipliers are at a premium, then an FHT may have an
> advantage since the first two stages don't require any multiplication.

Clearly false. Real-data FFTs have been proven to require no more
multiplications than FHTs, and fewer additions (at least, for all
currently known algorithms).

> I can go on and on, but I hope you're starting to get the drift.

Nope, sorry, I'd like to see a concrete advantage explained in terms
of the differences between the transforms that you claim are so
significant. Not vague handwaving that "well, the two transforms
aren't EXACTLY identical, so there must be SOME circumstance where
there is a substantial advantage for one or the other."

> almost always the way to go. * Evidently the difference between you
> and me is that I recognize that there are still those odd corner cases
> once in a while where the tradeoffs can go the other way.

I merely think one should have clear evidence before blithely
contradicting 20 years of study in a field. An awareness of past work
is also good to have.

Regards,
Steven G. Johnson

Eric Jacobsen
10-21-2008, 02:48 AM
On 21 Oct 2008 00:04:49 GMT, Andrew Reilly
<[email protected]> wrote:

>On Mon, 20 Oct 2008 14:21:09 -0700, Eric Jacobsen wrote:
>
>> I can go on and on, but I hope you're starting to get the drift. For
>> people who actually have to build things, being constrained by the
>> availability and details of the building blocks makes one look much
>> further than just academic papers and software library comparisons. All
>> that being said, I don't think I've personally plugged in an FHT
>> anywhere serious in at least fifteen years. As I said in my very
>> first post in this thread, memory and processing is cheap enough these
>> days that saving development time by plugging in a canned FFT is almost
>> always the way to go. Evidently the difference between you and me is
>> that I recognize that there are still those odd corner cases once in a
>> while where the tradeoffs can go the other way.
>
>I'm afraid that you still haven't made the case, at least to me.

I'm not really trying to make a case, believe it or not. I don't
really care what people use, and as I mentioned *I* haven't seen a
reason to personally use an FHT in a long time. I do admit to not
feeling a need to keep up with whatever details people are changing in
either FHTs or FFTs to try to make that last gasp of difference,
because:

a) unless the assumptions apply to the cases I'll encounter (and who
knows what those'll be) it may not matter

b) because any difference is likely to be small it'll probably be
easier to make it up somewhere else rather than replace a block that's
already developed and debugged (i.e., a canned transform) with new
development.

I've not yet encountered a situation where b) would kick in, although
I can certainly imagine cases where it might were I to be called on to
solve that particular problem.

> In
>fact, a couple of your comments, specifically about wasted space,
>irregularity of access pattern (i.e., address generator complexity) and
>the bit about no multiplications in the first two passes makes me think
>that you haven't looked at an FFT algorithm in about as long as your FHT
>experience. I'm not in Steven's league, but none of the FFT code that
>I've written in the last several years has had much other than linear
>memory addressing, and hasn't used multiplies in the first two passes.
>The first FFT machine that I encountered was indeed a VLSI device (the
>Austek A41102.) It achieved it's good performance for the day by
>streaming data in and out of DRAM chips with sequences of column-access
>cycles. That was pretty quickly overtaken by Motorola DSP chips, though.
>
>I suspect that a paraphrase of Steven's point (and, indeed my own
>interest) is not to say that there's anything *necessarily* worse about
>the FHT than the FFT (apart from the point that I make below), but that
>in any given instance, it would seem likely that a technique that makes
>the FHT faster could *also* be used to make the FFT faster. That's why
>he's keen to find particular examples and insights of such corner cases.

One only needs to look at the differences between the two transforms
to see where advantages could lie. Steven even pointed out that with
real-input FFTs many have complications that are a PITA. The
reference I cited mentioned that this was a key avenue in getting the
simplifications that led to their use of the FHT. There wasn't
enough detail in that paper to know exactly how those differences were
exploited, and it's been too long since I've built anything at that
level (because it's made more sense to use the canned stuff!) that I
just can't recall the tricks.

I'm not inclined to dig out old stuff and sort out what the heck I did
well over a decade ago, but it was clear to me that the differences in
the details were real enough to play a lot of tradeoffs in both
twiddle and data memory management depending on how one played the
rest of the tradeoffs (e.g., algorithm selection, memory architecture,
pipelining, etc., etc.). I know my lack of detailed recollection
doesn't count for much, but I don't know why people think that the two
transforms are so identical that there are no differences to exploit
when conditions allow. That's just not the case.

>Why are we computing a transform in the first place? Do we want to
>analyse the power spectrum, or do we want to perform convolution
>filtering, or some other purpose? It's been a while since I looked, but
>my memory of the FHT has it requiring a bunch of additional sum/
>difference operations on X_k and X_{N-k}, in order to perform either of
>those operations. That's not a particularly attractive addressing
>sequence, and a bunch of additional memory operations, to my eye.
>
>Cheers,

Actually, that's a reasonable case in point. If the transform is
done in-place in a linear memory (say, a length Nx1 memory for an N-pt
transform), and one only wants the magnitude of the output vector, if
the memory is dual-port the magnitude-squared vector can be formed in
N cycles. The funky output symmetry that you mention, where the
equivalent FT real and imaginary parts are computed from the output
as:

re(FT(f)) = HT(f) + HT(N-f)

im(FT(f)) = HT(f) - HT(N-f)

actually provides a simple way to get mag-squared output (which is
close to what the OP was looking for, and is, actually what made me
think of this whole thing). If you want the mag-squared, it boils
down to:

re(FT(f))^2 + im(FT(f))^2 = 2 [ HT(f)^2 + HT(N-f)^2 ]

You can get the factor of two for free (in hardware, anyway), so
that's not a complication. With a dual-port memory you can
simultaneously read HT(f) and HT(N-f) and progress through the
computations with two counters (or a counter and an adder or
subtractor). That's fairly simple and certainly not more complex
than the FT case.

Reading re(FT(f)) and im(FT(f)) is pretty simple, too, depending on
how the output data is arranged in the buffer. Splitting it into two
N/2 buffers is often workable and then only requires one counter for
the address generation, but one of the subtleties is evaluating how
the memories have to be architected. From a length-N real input to
either a length N/2 complex output, a length-N output with re() and
im() parts arranged somehow, or whatever. If one wants only one
buffer and the transform done in-place the tradeoffs may change from
having separate buffers for input and output and processing. A
multi-port memory makes some of it easier (different pieces for either
case), but the differences also start to become more apparent.

The correlation and convolution operations are nearly the same as for
the FT case, except that (just like with the magnitude) one address is
incrementing while the other decrements. The operations are
otherwise essentially the same. Whether a particular implementation
finds it "better" to use a single, length-N memory, a single length-N
memory with the address space split, or two length-N/2 memories for
the buffering could move the tradeoff one way or the other. The
differences are very small otherwise.

Where one actually finds an advantage either way will likely depend on
the details of how one wants to manage the memory, how much memory is
available and how it is arranged, and how much addrress generation and
control hardware one can afford. How many lines/bits are twiddling
during the operation affects power consumption, and there are likely
differences there as well (how could there not be, they're not
perfectly identical).

The article I cited seemed to indicate that that author found a lot of
simplification in those sorts of tradeoffs with the FHT. YMMV if you
don't have the same libraries and assumptions as he did, but I do not
find that author's conclusions incredible. In that article for the
stand-alone solutions investigated (i.e., comparing a particular FHT
to a particular FFT) while the FFT had smaller area its latency and
power consumption were much higher than the FHT. So, again,
depending on what's important to you the tradeoffs can go different
directions. There is no pat answer, and claiming that an FFT is
always better under all conditions and circumstance (if anyone is
actually claiming that) would seem to me to be a stretch, especially
if that point of view is based on analyses that didn't take metrics
into account that are important to an implementer at the time. Having
the sort of prescience and omniscience to cover all such cases just
seems incredibly implausible to me given that there are differences in
the two approaches and the sorts of things that can become important
in an implementation are often pretty peculiar.

Again, if nobody is claiming that an FFT is always better than an FHT
under all circumstances, then there's not really anything to argue
about. It's just an interesting discussion as far as I'm concerned.



Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Eric Jacobsen
10-21-2008, 08:32 AM
On Mon, 20 Oct 2008 18:10:49 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 20, 5:21*pm, Eric Jacobsen <[email protected]> wrote:
>> Why is it so important to you to make FHTs look inferior to FFTs? *
>
>Straw man.

I thought it was a legitimate question, as was the one regarding
whether you think there's never any possibility of an advantage with
an FHT. If you don't think so then I think there's no disagreement.
If you do think so, that's fine, too, we just disagree.

>I never said FHTs were inferior. I said that the two are nearly
>identical in every measurable respect, so far as is currently known---
>no one has demonstrated any significant intrinsic advantage one way or
>the other in terms of performance or memory utilization.
>
>You, on the other hand, claimed "substantial" benefits in
>"computational resources" for the FHT in some circumstances for
>unexplained reasons, including "substantial" memory savings.
>
>> have the other as well. * And either sin() or cos() can be folded so
>> that much less than a full wave needs to be stored, and you can do
>> that with cas() as well. *
>
>No you can't do it with cas as well. That was my point: if you look
>at the trig table of sin and cos values, they have twice as much
>redundancy as a table of cas values (which has less symmetry), and
>because of that the number of trigonometric constants is essentially
>the same for an FHT or FFT.

cas(x) = sqrt(2)*cos(x-pi/4)

The cas() function has identical symmetry to cos() with an offset of
pi/4. For a LUT that has a power-of-two length that offset is fairly
trivial to work around as far as folding the LUT is concerned.

According to your argument (which I don't quite follow because it
doesn't work out), that suggests (I think) that less storage is
required for cas() compared to cos() and sin()? I don't agree,
despite the fact that cas() can be folded the same as sin() and cos().
I think the amount of storage is the same, but since FHT twiddles are
always real-valued the accesses to that memory can be managed
differently.

If one needs simultaneous access to cos() and sin() from a
non-dual-port LUT for one's complex FFT twiddle pipeline, then it
seems to me that an FHT would have a memory advantage in that only a
single LUT would be required to feed a similar but real-valued
butterfly. Clearly there are ways to work around that from almost any
perspective, but it's what I had in mind when I said I thought there
could be a twiddle memory advantage with an FHT in certain
circumstances.

>>*Do you really think that there's never a difference?
>
>I think that no one has demonstrated any substantial difference; the
>only clear differences that have been demonstrated slightly favor FFTs
>(but only slightly....a few flops saved is irrelevant in practice).
>
>You were claiming "substantial" benefits in performance in memory,

What I said was, "In my experience the real advantage of the FHT was
the memory requirement, which can be substantially smaller than an FFT
depending on the constraints and what you're doing. Since memory is
seldom a constraint these days it's pretty moot."

This was just before I explained in my next post, "Some of my FHT
experience was on my Amiga in grad school when I was processing audio
sensor data (i.e., real-valued), and coded the whole thing in 68k
assembly with LUTs for the cas() function. I also had a hand-coded
68k-assembly FFT with LUT twiddles, and just the buffer handling
alone, with extra car paid to handling the input buffer and forming
the ouput magnitude vector provided a substantial advantage with the
FHT."

Note the operative use of past-tense, since the experience I was
describing was in about 1988, which was pretty contemporary with the
early "Look how great the FHT is!" papers, as well as Sorensen's RFFT
work. I was reading pretty much everything that came out on the
topic at the time, and, as I also stated previously, some of the
claims then of the nearly 2x superiority of the FHT were clearly
incorrect as far as I could tell (and, as you've repeatedly pointed
out, I was right about that). So you don't need to tell me about that
as I've known it from first-hand experience for twenty years.

This afternoon I dug out my June, 1987 Trans. on ASSP and discovered
that I'd left a fair number of hand-written notes around Sorensen's
RFFT FORTRAN code from when I translated (and debugged) it to C and
68k assembly for my thesis.

In my experience, at that time, in my application, on my Amiga, with
Sorensen's translated RFFT and my hand-coded FHT, I was able to get my
algorithms working for my thesis using less memory (and about the same
computation load) with the FHT than with the RFFT. I don't have
access to that code now without digging the Amiga out of storage and
cranking it up again and seeing if everything still works well enough
to read the files. I wouldn't hold my breath that I could recover
any of it, but I'm also not motivated enough to go do it in the first
place.

So, that was my basis for my claim of "substantial" improvement with
the FHT. I used the past-tense because, in fact, it was a hell of a
long time ago. It's very possible that improvements have been made on
the RFFT that could have changed that outcome, but I don't know for
certain so I didn't try to claim it was still true. Sundararajan's
1997 paper (which, also, oddly has a post-it bookmark in my Trans on
SP and a bunch of notes, so I must have read it when it came out)
might have helped, but that was nearly ten years later so I didn't
compare. By that time we were already in the realm where memory and
processing power was cheap enough that the small differences were no
longer enough to matter, so I was just plugging in canned IP and
routines at that point.

>and
>I was pointing out that, over the last 20 years, despite strenuous
>efforts, there is zero evidence to support your claim in any known
>circumstances,

It worked for me twenty years ago. That was evidence enough for me
to talk about it in the past tense.

> and plenty of evidence that the intrinsic differences
>between the algorithms are negligible (and swamped by differences in
>skill of implementors).

The implementations have been what I've been talking about all along.

>
>> FWIW, I've found several (recent) references where people are
>> currently building generalized transform hardware based on FHT engines
>> (kind of like Mayer did with his software). * Since a generalized
>> compression engine may need to do FFTs, FCTs, and FSTs, there are
>> evidently advantages to just building a single FHT to do all of those
>> tasks. * Here's one reference:
>>
>> http://ieeexplore.ieee.org/Xplore/defdeny.jsp?url=/iel5/4061132/40611...
>
>(You can do all of those functions using a real-data FFT, too.)
>
>That paper doesn't actually try doing a real-data FFT and comparing it
>to an FHT. In fact, the papers it cites as explanation for why they
>chose to do an FHT are actually papers explaining just the opposite,
>that real-data FFTs are just as efficient as FHTs and have similar
>structure.

Looking at it now it's not clear what he compared. In section two he
talks very clearly about how the structure of the RFFT is more
complicated than the FHT (for his purposes)* and so that influenced
his direction. Since the context of his comparisons are for
compression of real-valued sequences I took all of that to mean that
his comparisons were against RFFTs, but looking at it now it's not
clear. He's explicit about comparing against a 1024-pt FFT, but
doesn't say how long any of the other transforms were. Hmmm...I
agree, on second look there's not much concrete there, but I do still
offer it as recent reference of someone who, for their particular
problem, claims to have found an advantage with the FHT. There's
just not much info to go with it.

* "Structure" for developing a hardware architecture is often not
equal to signal flow "structure".

>DHTs certainly have conceptual attractions. And the fact that the
>same routine is its own inverse is sometimes convenient (the one
>concrete reason cited by the paper authors for their choice).

That's the most blatant and obvious case where an advantage could be
expected with an FHT. If you need to do a forward and inverse
transform in the same hardware, there's reason to believe some
substantial advantages could be realized using an FHT over an RFFT.
The OP's case wouldn't benefit from that, and the compression cases
I've been seeing where people are claiming that an FHT helps generally
don't have a compressor and decompressor co-located. So in those
applications there may not be such a distinct expectation for an
advantage. Given all that I hadn't mentioned it since it didn't seem
to fit the applications at hand, despite the cited conference paper's
author's note of it.

>If you had said, "I find the DHT conceptually nicer, and the fact that
>it is its own inverse leads to some savings of code/gates in some
>applications," I wouldn't have any objection.

Oh, so it's the details of how I framed the argument that mattered. I
see. ;)

>What bothered me is
>your claiming "substantial" benefits in performance and that the
>"memory requirement is substantially smaller", both of which claims
>are contradicted by 20 years of work.

Which I didn't have access to twenty years ago, unfortunately. So my
past-tense claims of what happened in the past when I couldn't have
taken advantage of what were then future developments will, sadly,
have to remain unaltered. This means you will have to remain
bothered, I'm afraid.

>> For the constraints and optimizations in this case (and several are
>> actually shown), the author found using an FHT as the basis
>> advantageous.
>
>None are shown; are we reading the same 2-page conference paper? The
>timing numbers they report for FFT etc are worse than those for FHT,
>but that's because they implemented the other transforms using an FHT
>(hence with additional overhead). They didn't actually try comparing
>to any intrinsic real-data FFT algorithm, so that paper really isn't
>evidence one way or the other.

I don't think that's the case. Look more closely at the results
table. If the FFT under comparison were based on the FHT under
comparison, it would be at least the same size as the FHT. It is
not. It is significantly smaller. This suggests that it was not an
FHT-based FFT under comparison. Whether or not it was an RFFT is not
at all clear.

There is also a combined FHT+FFT case, which *may* have been built
around an FHT core, but, again, it's not clear.

In any case, this particular author, for this particular solution,
using whatever his methodology might have been, found an advantage
with the FHT for metrics that mattered to him (compression ratio in
particular, which, again, isn't well defined). It can be seen quite
readily, though, that the latency and power consumption were
significantly less for the FHT shown than the FFT shown, FWTW.

I merely offered it as an existence proof of a recent published case
where somebody found an advantage of an FHT over (what I thought was a
real-input) FFT. It's not clear now whether it was a real-input FFT
or not so who knows.

>And what, precisely, do you see as the substantial "performance" and
>"memory" advantages of an FHT in this circumstance?

The advantage is having two transforms with some differences in
internal operation that can be traded off in the presence of
particular constraints in implementation details. How that tradeoff
comes out depends entirely on the fine details. If it were fifteen
years ago when I was a lot more on top of this stuff I could probably
cite some combinations of memory architecture, algorithm (DIT, DFT,
whatever), arithmetic availability or whatever that made a difference.
Today it's been too long ago and those brain cells are reluctant to
put down their well-deserved beer and give it a second look. I don't
blame them. Sue me.


>> Can you begin to see that each of these cases is quite different and
>> optimized solutions for each might look quite different from each
>> other? *
>
>Of course they are different, but you still have yet to (correctly)
>explain a clear concrete advantage for FHTs in performance or memory
>in *any* circumstance.

My Amiga twenty years ago doesn't count? Self-inverse doesn't count?
Phooey.

>> If memory is tight and only chunks of
>> certain sizes are available (and one is limited in how much area can
>> be used for memory), then damn straight tootin' an FHT can have an
>> advantage.
>
>Why do you think that an FHT requires smaller chunks of data than a
>real-data FFT?

Not necessarily smaller chunks, differently organized chunks. If you
want to re-use the same memory block for the input buffer, in-place
processing, and output buffer, for an FHT that's not that big of a
deal. For an RFFT (as I understand them), the addressing will
require more effort at some point or other (perhaps only at the
output) and perhaps some additional latency due to having to make
certain accesses wait. Whether or not the memory is multi-port may
affect that. If one has different buffers for input, processing, and
output, then they can be organized differently (e.g., Nx1 input,
(N/2)x2 output) to help alleviate those complications. Whether or not
memory is available at N/2 length, or whether or not multi-port memory
is available, would probably matter. If one only has a single
length-N memory, especially single-port, that may well favor an FHT
over an RFFT. I couldn't say for certain without going through the
exercise, and as much as I'm enjoying the dialogue I'm just not that
motivated to do any actual work about it.

>> address generation can matter. *If pipelining is required for repeated
>> transforms and multipliers are at a premium, then an FHT may have an
>> advantage since the first two stages don't require any multiplication.
>
>Clearly false. Real-data FFTs have been proven to require no more
>multiplications than FHTs, and fewer additions (at least, for all
>currently known algorithms).

It's not the quantity of multiplication operations I was getting at,
it's the availability for pipelining, i.e., how many discrete
multipliers would be needed in the engine to keep all stages of an
n-stage transformer working for repeated transforms. Since the FHT
does not require multipliers in the first two stages, none would be
needed to be instantiated for those pipelines. That could be an
advantage if only a certain number of multipliers were available (as
is the case with many FPGAs). Andrew has since indicated that he's
used FFTs that did not require multiplies in the first two stages (he
didn't clarify whether that included RFFTs, so I don't know), so there
may not be an advantage there, anyway.

>> I can go on and on, but I hope you're starting to get the drift.
>
>Nope, sorry, I'd like to see a concrete advantage explained in terms
>of the differences between the transforms that you claim are so
>significant. Not vague handwaving that "well, the two transforms
>aren't EXACTLY identical, so there must be SOME circumstance where
>there is a substantial advantage for one or the other."

I can only offer what I've mentioned consistent with my experience. I
don't really care whether that satisfies you or not, I merely offer it
for consideration as a view from someone not entirely ignorant of the
issues. Like all things on teh intertubes, all are free to take it as
they will (or not, whatever).

>> almost always the way to go. * Evidently the difference between you
>> and me is that I recognize that there are still those odd corner cases
>> once in a while where the tradeoffs can go the other way.
>
>I merely think one should have clear evidence before blithely
>contradicting 20 years of study in a field. An awareness of past work
>is also good to have.
>
>Regards,
>Steven G. Johnson

I merely think one should look beyond academic papers and software
library comparisons before concluding that all implementers everywhere
under all constraints will be bound to a certain outcome. I have
difficulty imagining why one would think that was how things work.

And I have a good awareness of the past work as much of it was
contemporary to when I was actively doing this exact stuff. I don't
have much awareness of more recent work (say, much past Sundararajan's
1997 paper, although I seem to recall having read a few tidbits here
and there since then). I've already said why many times.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

VelociChicken
10-21-2008, 01:46 PM
"VelociChicken" <[email protected]> wrote in message
news:[email protected]...
> >The same savings can be obtained for an FFT algorithm specialized for
>>real inputs.
>>
>>Yes, if you have an FFT subroutine that takes n complex inputs and
>>returns n complex outputs, then you would need to take your real
>>inputs and convert them into a complex array with zero imaginary
>>parts, requiring twice the storage. However, it's been observed since
>>(I think?) the late 1960's that this factor of 2 is not necessary.
>>
>>Instead, there are a number of FFT algorithms specialized for real
>>data that take a real array as input, and return half of a complex
>>array as output (requiring the same storage as the real array, and
>>possibly operating in-place). The reason you can do this is that the
>>discrete Fourier transform of real inputs is half redundant: the
>>second half is the complex conjugate of the first half, in reverse
>>order.
>>
>>(Probably the most well-known such technique, although not generally
>>the most efficient, is a trick to express a real-input DFT of length n
>>in terms of a complex-input FFT of length n/2 plus some post-
>>processing. An alternative technique is to start with a complex-data
>>FFT algorithm, and go through the algorithm and prune out the
>>redundant operations...again, you can save a factor of 2 in memory and
>>slightly more than a factor of 2 in computation compared to the
>>complex-input algorithm.)
>>
>>The existence of these real-input FFT algorithms is why all claims of
>>factor-of-two savings from FHT algorithms were (at best!) very
>>misleading, as was pointed out shortly after FHTs was introduced.
>>Inexplicably, this mythical factor of two has persisted for an
>>amazingly long time in the imaginations of engineers (and some
>>unfortunate authors).
>>
>>Regards,
>>Steven G. Johnson
>
> Thank-you for your verbose and interesting reply. Is it possible to tell
> me which would actually be the best (fastest, and free) real-FFT that I
> can test myself? I'm using Intel CPUs.
> Thanks again,
> Dave

These graphs appear to show FHT in good favour:

http://www.geocities.com/ResearchTriangle/8869/fft_summary.html

Does anybody agree? I'm guessing the trig table and accuracy tradeoffs have
a big part to play in the results.

Steven G. Johnson
10-21-2008, 07:30 PM
On Oct 21, 3:32*am, Eric Jacobsen <[email protected]> wrote:
> The cas() function has identical symmetry to cos() with an offset of
> pi/4. *For a LUT that has a power-of-two length that offset is fairly
> trivial to work around as far as folding the LUT is concerned.

Except that between the cos and sin tables, for power-of-two sizes,
there is a lot of addtitional redundancy because cos(x) = sin(pi/2 -
x).

So, for example, for n=16, if you look at the cos and sin tables,
there are three distinct constants (not counting trivial constants
like 1 and 0, and sign changes, all of which can be optimized out).
For the cas table, there are four distinct nontrivial constants.

In general, you can show that the number of nontrivial constants is
the same, differing by at most one in favor of FFTs, just as I
claimed.

(For odd lengths, the pi/4 shift breaks the symmetry once you
discretize, and the cas table has no reflection symmetry at all,
unlike cos and sin, although in that case the cos and sin tables must
be distinct.)

> required for cas() compared to cos() and sin()? * I don't agree,
> despite the fact that cas() can be folded the same as sin() and cos().
> I think the amount of storage is the same, but since FHT twiddles are
> always real-valued the accesses to that memory can be managed
> differently.

I'm glad you agree that the storage is the same; you had earlier
claimed that there was a "memory saving is in the twidlle factors",
which on its face sounded like you were claiming lower storage.

Regarding the twiddles being "real valued" and hence yielding some
important difference in the access pattern, you are mistaken. If you
actually look at the structure of FHT algorithms, they still use pairs
of constants at a time. In fact, if you look at Bracewell's original
FHT algorithm and most subsequent FHT algorithms, they don't use the
cas function at all -- the twiddle factors are actually pairs of (cos,
sin) values, exactly as for an FFT.

> Note the operative use of past-tense, since the experience I was
> describing was in about 1988, which was pretty contemporary with the
> early "Look how great the FHT is!" papers, as well as Sorensen's RFFT

First, you have used present and future tense many times in this
thread when describing the FHTs supposed substantial advantages in
performance and memory requirements.

Second, the fact that you cite your own 20-year-old results in support
of your claim, from a your admittedly incomplete understanding at the
time of the differences (or lack thereof) between FHTs and real-input
FFTs, does not make your claims any more correct. It just means that
you have a better excuse for being wrong.

> In my experience, at that time, in my application, on my Amiga, with
> Sorensen's translated RFFT and my hand-coded FHT, I was able to get my
> algorithms working for my thesis using less memory (and about the same
> computation load) with the FHT than with the RFFT. *

Then you implemented the algorithms imperfectly. Both can operate in-
place, and both require the same size twiddle tables. Unless you are
talking about instruction memory, not data memory, which is negligible
here unless your dataset is very small, but if code size is the major
issue then there are other things you can do; see below.

> I don't think that's the case. * Look more closely at the results
> table. * If the FFT under comparison were based on the FHT under
> comparison, it would be at least the same size as the FHT. * It is
> not. *It is significantly smaller. * This suggests that it was not an
> FHT-based FFT under comparison. * Whether or not it was an RFFT is not
> at all clear.

The paper explicitly states, "FHT is taken as the basic VLSI
architecture. It is used as architecture for obtaining all other
transforms." Maybe they were quoting the additional gates to get an
FFT from an FHT?

In any case, I agree that the paper is very unclear -- it's not
something you can really draw any useful conclusions from regarding
FHT vs. RFFT.

> I merely offered it as an existence proof of a recent published case
> where somebody found an advantage of an FHT over (what I thought was a
> real-input) FFT. * It's not clear now whether it was a real-input FFT
> or not so who knows.

Certainly, there are people still publishing the occasional paper on
FHT algorithms, and even making vague claims of performance benefits
(although nowadays they just cite conceptual preferences). I have no
disagreement on that. But for 20 years they haven't been able to back
it up with clear comparisons, and the (rather sketchy) paper you cite
is no different in this regard.

> >Of course they are different, but you still have yet to (correctly)
> >explain a clear concrete advantage for FHTs in performance or memory
> >in *any* circumstance.
>
> My Amiga twenty years ago doesn't count? * Self-inverse doesn't count?

Self-inverse is not a performance advantage, nor is it memory
advantage except for extremely small transforms -- i.e. it's only O(1)
savings at best. And in the latter case there are actually ways to
get an inverse RFFT from a forward RFFT. In practice, when one is
talking about memory advantages one is normally talking about the data
[O(n) memory]. You yourself kept talking about twiddle factors,
addressing, pipelining, data permutations, etcetera, which is what I
was disagreeing with -- for you to suddenly turn around and pretend
that you were only talking about code size from the self-inverse
property (which I was the one to point out, BTW) is disingenuous.

Your vague reference to unpublished results on your Amiga doesn't
count because there's no indication that your real-input FFT was
especially good -- it says more about you than about the algorithms.

It's trivial to find *some* real-input FFT code that is slower than
*some* FHT code, but that's not particularly meaningful. What is
meaningful is that people have been trying to implement FHTs for 20
years, and the best real-input FFTs have consistently been at least as
good. At a more fundamental level, no one has identified a difference
between the two algorithms that would seem to give a substantial
advantage one way or the other in performance---arithmetic, memory
access patterns, memory utilization, etcetera are all essentially
identical.

> Not necessarily smaller chunks, differently organized chunks. *If you
> want to re-use the same memory block for the input buffer, in-place
> processing, and output buffer, for an FHT that's not that big of a
> deal. * For an RFFT (as I understand them), the addressing will
> require more effort at some point or other (perhaps only at the
> output)

Your understanding of real-input FFT algorithms is incorrect.

To quote Sorensen in 1987, "The resulting programs are nearly the
same, due to the fundamental similarity between the Hartley and RFFT
algorithms, so the indexing and other overhead costs are essentially
the same".

In the classic bit-reversal approach to in-place transforms, the FHT
and RFFT algorithms require the same permutation on the input or
output. As Sorensen wrote in another 1985 paper, "The permutation of
the time indices will be exactly the same as for a Cooley-Tukey
radix-2 FFT, because the same decomposition sequence is applied to
both."

I should also mention that there have been a few developments in FFT
algorithms since 1965, and in particular it is perfectly possible to
perform an in-place transform without any separate bit-reversal stage
at all. Basically, you mix some transpositions in with the other
computation stages in order to do both permutations and computations
at the same time, which reduces the memory pressure compared to a
separate bit-reversal pass. And on doesn't generally want to make
log2(n) separate passes over the whole array either; you want to
reorder the computations to improve locality. But since FHTs and
RFFTs are essentially the same from the perspective of the
computational graph and memory access patterns, the same optimizations
apply to both (although in practice more work has been done on the
latter).

> is the case with many FPGAs). * Andrew has since indicated that he's
> used FFTs that did not require multiplies in the first two stages (he
> didn't clarify whether that included RFFTs, so I don't know), so there
> may not be an advantage there, anyway.

The same is true for an RFFT: both a size-4 FFT and a size-4 RFFT
require no multiplications, for the same reason (the twiddle factors
are all +/-1 or +/-i).

(The first "two stages" of a radix-2 algorithm just compute a bunch of
size-4 transforms. In practice, the 1965 approach of dividing things
into a sequence of radix-2 passes over the whole array is actually a
poor idea if you have any kind of memory cache, or a significant
number of registers, so you are arguing about algorithms that in many
cases are obsolete. But your arguments are still wrong.)

> I merely think one should look beyond academic papers and software
> library comparisons before concluding that all implementers everywhere
> under all constraints will be bound to a certain outcome. *I have
> difficulty imagining why one would think that was how things work.

The burden of proof is on the person claiming a substantial
performance/memory advantage one way or another, not on the person
pointing out that in 20 years no one has demonstrated such an
advantage. And whenever you have been pinned down to specifics on
naming a difference that could lead to a substantial performance
difference, you have been wrong.

Regards,
Steven G. Johnson

Steven G. Johnson
10-21-2008, 08:05 PM
On Oct 21, 8:46*am, "VelociChicken" <[email protected]> wrote:
> These graphs appear to show FHT in good favour:
>
> http://www.geocities.com/ResearchTriangle/8869/fft_summary.html

The lowest time on Mayer's graph is for FFTW, which does not use an
FHT. (He doesn't say what version of FFTW he's benchmarking there,
but I'm guessing it's FFTW 2.x from the late 1990s. See also graphs
from more recent machines and a wider variety of software on fftw.org/
speed)

Mayer's code is faster than the other code he benchmarked, but
arguably that is not because he uses an FHT, it is because he uses a
larger radix than all of the programs except FFTW; the other old
radix-2 and split-radix routines are no longer competitive on modern
general-purpose CPU because of poor cache and register/pipeline
utilization.

As I said before, Mayer's code is very good, and will probably beat a
lot of the real-input FFT code floating around on the web. But
there's no evidence that this comes from any intrinsic property of
FHTs rather than from Mayer's skill as a programmer; certainly,
Mayer's code can be beaten by the fastest real-input FFT
implementations. But if you want a nice, compact, clear program with
reasonable performance, I wouldn't hesitate to recommend Mayer's
routines.

Steven

Eric Jacobsen
10-21-2008, 09:46 PM
On Tue, 21 Oct 2008 11:30:46 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 21, 3:32*am, Eric Jacobsen <[email protected]> wrote:
>> The cas() function has identical symmetry to cos() with an offset of
>> pi/4. *For a LUT that has a power-of-two length that offset is fairly
>> trivial to work around as far as folding the LUT is concerned.
>
>Except that between the cos and sin tables, for power-of-two sizes,
>there is a lot of addtitional redundancy because cos(x) = sin(pi/2 -
>x).
>
>So, for example, for n=16, if you look at the cos and sin tables,
>there are three distinct constants (not counting trivial constants
>like 1 and 0, and sign changes, all of which can be optimized out).
>For the cas table, there are four distinct nontrivial constants.
>
>In general, you can show that the number of nontrivial constants is
>the same, differing by at most one in favor of FFTs, just as I
>claimed.
>
>(For odd lengths, the pi/4 shift breaks the symmetry once you
>discretize, and the cas table has no reflection symmetry at all,
>unlike cos and sin, although in that case the cos and sin tables must
>be distinct.)

My point was just that there are simple equivalences between cas() and
sin() and cos() with trivial phase offsets and a scale constant. So
whatever you do for one can be done with the others. Any differences
come down to accesses and whether one needs two LUTs or can get by
with one, which might depend on implementation constraints and perhaps
favor one over the other. That's all I was getting at and I think
it's still true.

You disputed my claim that cas() could be folded and were incorrect to
do so. It can be folded the same as sin() or cos(), one can even use
a reduced sin() or cos() LUT to compute cas() with a trivial phase
offset and an output scale factor. One can even move the scale factor
out further in the process if desired.


>> required for cas() compared to cos() and sin()? * I don't agree,
>> despite the fact that cas() can be folded the same as sin() and cos().
>> I think the amount of storage is the same, but since FHT twiddles are
>> always real-valued the accesses to that memory can be managed
>> differently.
>
>I'm glad you agree that the storage is the same; you had earlier
>claimed that there was a "memory saving is in the twidlle factors",
>which on its face sounded like you were claiming lower storage.
>
>Regarding the twiddles being "real valued" and hence yielding some
>important difference in the access pattern, you are mistaken. If you
>actually look at the structure of FHT algorithms, they still use pairs
>of constants at a time. In fact, if you look at Bracewell's original
>FHT algorithm and most subsequent FHT algorithms, they don't use the
>cas function at all -- the twiddle factors are actually pairs of (cos,
>sin) values, exactly as for an FFT.

I've done both ways.

>> Note the operative use of past-tense, since the experience I was
>> describing was in about 1988, which was pretty contemporary with the
>> early "Look how great the FHT is!" papers, as well as Sorensen's RFFT
>
>First, you have used present and future tense many times in this
>thread when describing the FHTs supposed substantial advantages in
>performance and memory requirements.
>
>Second, the fact that you cite your own 20-year-old results in support
>of your claim, from a your admittedly incomplete understanding at the
>time of the differences (or lack thereof) between FHTs and real-input
>FFTs, does not make your claims any more correct. It just means that
>you have a better excuse for being wrong.

No, it means I've addressed you continually pointing to your mistaken
interpretation of what I wrote as ammunition for your position. I
even quoted what I orginally wrote so that I could clarify what I
meant, and you still insist my position is something it is not. There
is no logical defense against that so there's not much I can do there.

I've cited some personal experience with memory savings in a system
(not necessarily the transform alone), and have been happy to clarify
that context. I have not claimed and do not believe that there are
any big differences between an FHT and an RFFT in either fundamental
computation load or memory requirement. I've known this for a couple
of decades. You seem to be in violent agreement on that point.

I do however, from a few decades experience, know that little
differences can become "substantial" in making something work or not
under certain sets of constraints. It's too bad that you've chosen
to hang so much meaning on that word, but so be it. You seem to have
used that to mistakenly interpret what I wrote as a factor of two in
one case, and it took a fair amount of effort to get back to the point
where you don't really disagree with me there at all. I think. It's
hard to tell for certain.

>> In my experience, at that time, in my application, on my Amiga, with
>> Sorensen's translated RFFT and my hand-coded FHT, I was able to get my
>> algorithms working for my thesis using less memory (and about the same
>> computation load) with the FHT than with the RFFT. *
>
>Then you implemented the algorithms imperfectly.

Personally, I think all implementations are imperfect and have
compromises, but that's just me.

>Both can operate in-
>place, and both require the same size twiddle tables. Unless you are
>talking about instruction memory, not data memory, which is negligible
>here unless your dataset is very small, but if code size is the major
>issue then there are other things you can do; see below.

I'm talking about the amount of memory it took to get the particular
job done that I was doing, which wasn't just comparing transforms. I
had real-valued input, did a lot of cross-correlation in the frequency
domain, magnitude detection, and some inverse transforms for parameter
extraction. Doing all that, including the FD cross-correlations, and
magnitude detection, required "substantially" less memory than the
RFFT solution I had at hand (which was Sorensen's) used to do the same
process. I don't recall the details and, as I mentioned, I'm not
inclined to revive a 20+-year-old computer to get the files. I do
recall that it had to do with getting some interesting savings due to
careful management of the use of the output buffers and the
cross-correlation operations and then feeding the inverse transform.
It had something to do with an exploitation of the data arrangement at
the output of the FHT. I do not recall the details.

Doing the same function with the RFFT that I had available to me at
the time required a chunk (IIRC something like N/2, I don't recall
completely it was a long time ago) somewhere in the process that I
didn't need with the FHT. That turned out to be the only advantage,
and when it was all done I thought it wasn't even worth it. It's
stuck in my mind because, while the difference was small (but
"substantial" by my judgement) it was a clear advantage that I got
with the FHT that I couldn't get with that particular RFFT for that
particular application for the particular process I was using.

>> I don't think that's the case. * Look more closely at the results
>> table. * If the FFT under comparison were based on the FHT under
>> comparison, it would be at least the same size as the FHT. * It is
>> not. *It is significantly smaller. * This suggests that it was not an
>> FHT-based FFT under comparison. * Whether or not it was an RFFT is not
>> at all clear.
>
>The paper explicitly states, "FHT is taken as the basic VLSI
>architecture. It is used as architecture for obtaining all other
>transforms." Maybe they were quoting the additional gates to get an
>FFT from an FHT?
>
>In any case, I agree that the paper is very unclear -- it's not
>something you can really draw any useful conclusions from regarding
>FHT vs. RFFT.
>
>> I merely offered it as an existence proof of a recent published case
>> where somebody found an advantage of an FHT over (what I thought was a
>> real-input) FFT. * It's not clear now whether it was a real-input FFT
>> or not so who knows.
>
>Certainly, there are people still publishing the occasional paper on
>FHT algorithms, and even making vague claims of performance benefits
>(although nowadays they just cite conceptual preferences). I have no
>disagreement on that. But for 20 years they haven't been able to back
>it up with clear comparisons, and the (rather sketchy) paper you cite
>is no different in this regard.

Do a Google search on "VLSI Hartley FHT". There are actually a fair
number of papers doing something quite similar to what this guy did,
i.e., applying the FHT as a transform engine for compressors that need
to do multiple types of transforms. I don't know whether people
still do that task that way, but I have heard rumblings in that sense
here and there. The compressors in question do operate on real-valued
inputs. I offer the aggregate of those papers as existence proof that
there are people out there who find, for their particular applications
under their particular set of constraints, that an FHT provides an
advantage. That's all I've been claiming, is that those cases can
exist.

>> >Of course they are different, but you still have yet to (correctly)
>> >explain a clear concrete advantage for FHTs in performance or memory
>> >in *any* circumstance.
>>
>> My Amiga twenty years ago doesn't count? * Self-inverse doesn't count?
>
>Self-inverse is not a performance advantage, nor is it memory
>advantage except for extremely small transforms -- i.e. it's only O(1)
>savings at best. And in the latter case there are actually ways to
>get an inverse RFFT from a forward RFFT. In practice, when one is
>talking about memory advantages one is normally talking about the data
>[O(n) memory]. You yourself kept talking about twiddle factors,
>addressing, pipelining, data permutations, etcetera, which is what I
>was disagreeing with -- for you to suddenly turn around and pretend
>that you were only talking about code size from the self-inverse
>property (which I was the one to point out, BTW) is disingenuous.

I am, and have been, talking about implementation advantages. An
advantage in implementation may not involve changing the overall
memory size requirement, it may not involve (and likely won't)
changing the number of multiplies or additions performed. It may,
however, involve reducing the number of multipliers that need to be
instantiated, or the precision carried at a certain point, or the type
of memory used (which, even if the size is the same, can matter), or
the cycle count to get the job done, or the power consumed in doing
it.

As I pointed out, while the conference paper results got an FFT that
was substantially smaller in die real-estate than the FHT, had longer
latency and consumed substantially more power than the FHT
implementation. I think that the FFT could not have been based on an
FHT or it would have occupied more die real-estate than the FHT
compared. That shows that there was an implementation advantage to
the FFT in this particular comparison for die real-estate, but an
advantage to the FHT for latency and power consumption. Depending
on which of those factors were important to a user one would choose
one or the other accordingly, but if latency and/or power consumption
were the deciding factors, then clearly the FHT would have an
advantage in that case.

It could be, and I wouldn't be surprised at all, if they both had the
same amount of memory and did the same number of operations, but the
implementation differences in control, layout, or wherever the
differences were, got the results shown. That's a very typical thing
to happen in these sorts of tradeoffs, so "implementation advantage"
can take many forms which have real value that have to do with metrics
OTHER THAN total memory or operation count or what the
trellis/butterfly diagram looks like. That's what I've been trying
to get at even from my earliest posts in this thread.

But I do not exclude the possibility that even memory or computational
advantages might be gained in the basic transform one way or other
depending on the circumstances, and the fact that the outputs of the
two transforms are organized differently suggests that that could be
exploited. I've thrown out some possible thoughts there, and I've had
some personal (albeit long ago) experiences, but I've not claimed
anything concrete and I've not offered examples because I don't have
any. A couple decades of experience with such things tells me that
it would be foolish to conclude that it cannot be done, and my
knowledge of the facts suggests that it likely could be under the
right circumstances. That's all I've offered.

You seem to have misinterpreted a lot of what I've said, and since
that can easily be from me not being clear, I've tried to clarify
where I thought those misinterpretations could be. If you want to
ignore the clarifications and insist I meant something other than I
did there's not much I can do about that. It seems to me you've just
latched onto interpreting certain bits of what I've said in the least
favorable light to whatever it is that's important to you about this.
I can continue to try to clarify if it's helpful.

>Your vague reference to unpublished results on your Amiga doesn't
>count because there's no indication that your real-input FFT was
>especially good -- it says more about you than about the algorithms.

I told you where the real-input FFT came from. I'm a bit surprised
that you seem to be using it for a character attack.

If you think comp.dsp should restrict itself to only discussion of the
results of published papers and exclude unpublished personal
experience I'd disagree strongly with that notion. Meanwhile I'll
continue to post my thoughts based on my experiences and people can
take them for what they're worth...that's how the internet works
pretty much.


>It's trivial to find *some* real-input FFT code that is slower than
>*some* FHT code, but that's not particularly meaningful. What is
>meaningful is that people have been trying to implement FHTs for 20
>years, and the best real-input FFTs have consistently been at least as
>good. At a more fundamental level, no one has identified a difference
>between the two algorithms that would seem to give a substantial
>advantage one way or the other in performance---arithmetic, memory
>access patterns, memory utilization, etcetera are all essentially
>identical.

The VLSI-related papers, including the one I already cited, are
existence proofs to the contrary (with the mentioned caveats). As
I've tried to point out for a while, unless the comparisons on which
you're relying included all the possible implementation constraints
that typically present themselves (and I know they haven't), then they
have to be interpreted in the scope for which they apply. As I said
early on in the thread, operation counts don't tell the whole story.
Nor do memory sizes. Even some of the oft-cited comparison papers
show tradeoffs based on things like what butterfly architecture is
used. Wherever there's a difference there's a possible exploitation
in implementation.

And I have to say you wouldn't necessarily know whether anybody has
really gotten a big advantage with an FHT because LOTS of people who
work on these things don't publish (for a lot of good reasons). And
sometimes even when somebody does want to publish on a topic that got
hashed out ten years ago, it won't make it onto the editor's radar
because it's not the topic du jour. The VLSI papers (including the
one cited) seem to be being published in places other than where you
might be looking.

So I put only so much weight on the literature's conclusions, because
it should be interpreted within the scope that it exists, and I don't
think that scope includes all possible constraints. The number of
permutations of possible success metrics and constraints in
implementations is so huge that it couldn't possibly be covered in the
literature. Nobody is going to publish a paper because they saved a
few mutlipliers or a small amount of memory or a bit of power
consumption in a specific implementation with a specific library
because no editor would find it interesting. And that's appropriate
because the scope is so small. I understand this, and that's why I
find these smatterings of hints that guys doing compression in VLSI
using FHTs might be onto something for that particular application. I
think it would be foolish to say that since academic literature hasn't
identified a general top-level algorithmic advantage one way or other
that no one has or can find one in implementation.

It's a bit different when you're building things in industry than it
is in academia.

>> Not necessarily smaller chunks, differently organized chunks. *If you
>> want to re-use the same memory block for the input buffer, in-place
>> processing, and output buffer, for an FHT that's not that big of a
>> deal. * For an RFFT (as I understand them), the addressing will
>> require more effort at some point or other (perhaps only at the
>> output)
>
>Your understanding of real-input FFT algorithms is incorrect.
>
>To quote Sorensen in 1987, "The resulting programs are nearly the
>same, due to the fundamental similarity between the Hartley and RFFT
>algorithms, so the indexing and other overhead costs are essentially
>the same".
>
>In the classic bit-reversal approach to in-place transforms, the FHT
>and RFFT algorithms require the same permutation on the input or
>output. As Sorensen wrote in another 1985 paper, "The permutation of
>the time indices will be exactly the same as for a Cooley-Tukey
>radix-2 FFT, because the same decomposition sequence is applied to
>both."
>
>I should also mention that there have been a few developments in FFT
>algorithms since 1965, and in particular it is perfectly possible to
>perform an in-place transform without any separate bit-reversal stage
>at all. Basically, you mix some transpositions in with the other
>computation stages in order to do both permutations and computations
>at the same time, which reduces the memory pressure compared to a
>separate bit-reversal pass. And on doesn't generally want to make
>log2(n) separate passes over the whole array either; you want to
>reorder the computations to improve locality. But since FHTs and
>RFFTs are essentially the same from the perspective of the
>computational graph and memory access patterns, the same optimizations
>apply to both (although in practice more work has been done on the
>latter).

FWIW, I wasn't talking about bit reversal. I'm aware of what you seem
to be trying to point out.

>> I merely think one should look beyond academic papers and software
>> library comparisons before concluding that all implementers everywhere
>> under all constraints will be bound to a certain outcome. *I have
>> difficulty imagining why one would think that was how things work.
>
>The burden of proof is on the person claiming a substantial
>performance/memory advantage one way or another, not on the person
>pointing out that in 20 years no one has demonstrated such an
>advantage. And whenever you have been pinned down to specifics on
>naming a difference that could lead to a substantial performance
>difference, you have been wrong.
>
>Regards,
>Steven G. Johnson

Again, I'm not making any big claims, I don't have a dog in this hunt,
I don't care what people use or don't use. I'm just sharing my
perspective and experience and it evidently is different than yours,
and more notably comes from a pretty different perspective.

You, however, do seem to be claiming something or other, I'm not quite
sure what it is other than that I'm wrong. I don't agree that because
academic literature has come to some conclusion that it must be true
in all cases, because generally and almost universally they can't even
begin to cover detailed implementation issues. Papers have to be
taken within their context, and I've completely agreed with the
general conclusion from the beginning (for the last twenty years!)
that the differences in the basic transforms are small and subtle, and
I've repeatedly said so myself in this very thread. So I'm not
arguing with your general conclusion or that the papers are wrong, but
those small subtle differences can become significant, even huge,
under some implemetation constraints. That won't be a surprise to
anyone who's worked under such constraints and there's some evidence
that others working in VLSI have actually seen advantages with the
FHT.

The big issue seems to be that I don't have a specific set of
constraints to cite where a measurable "advantage" can be
demonstrated. That's correct, I don't. I'm not inclined to go cook
one up, either, because it's a lot of work. Usually the constraints
come to you and you find what you find to solve the problem at hand
with the tools and building blocks you have. That's when you find the
real differences for that situation, and, again, I've already said
that in my experience just plugging FFT cores in has been the way to
go for ages. Until I hit a set of constraints that requires me to do
differently I am unlikely to do so.



Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php

Steven G. Johnson
10-22-2008, 12:11 AM
On Oct 21, 4:46*pm, Eric Jacobsen <[email protected]> wrote:
> The big issue seems to be that I don't have a specific set of
> constraints to cite where a measurable "advantage" can be
> demonstrated. * That's correct, I don't. *

Yup, that is the big issue - you have no evidence for your claims, or
even a concrete description of where a performance/memory advantage
*might* come from. Since you concede that, and just want to say that
the transforms aren't *exactly* the same, so there might somehow be
some performance advantage somewhere (perhaps on hardware powered by
unicorns), then I'm happy to conclude the discussion.

Steven

Eric Jacobsen
10-22-2008, 12:54 AM
On Tue, 21 Oct 2008 16:11:32 -0700 (PDT), "Steven G. Johnson"
<[email protected]> wrote:

>On Oct 21, 4:46*pm, Eric Jacobsen <[email protected]> wrote:
>> The big issue seems to be that I don't have a specific set of
>> constraints to cite where a measurable "advantage" can be
>> demonstrated. * That's correct, I don't. *
>
>Yup, that is the big issue - you have no evidence for your claims, or
>even a concrete description of where a performance/memory advantage
>*might* come from. Since you concede that, and just want to say that
>the transforms aren't *exactly* the same, so there might somehow be
>some performance advantage somewhere (perhaps on hardware powered by
>unicorns), then I'm happy to conclude the discussion.
>
>Steven

Did you think I was ever saying something different than that? Well,
not the unicorn part.

If you didn't disagree with that from the beginning you could have
easily cleared it up the several times I asked you to clarify your
position.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Blog: http://www.dsprelated.com/blogs-1/hf/Eric_Jacobsen.php