I am doing a project in matlab to compare the computational complexities o
two systems the problem I have is that I need to work out how many multipl
and adds a 150 point ifft uses in matlab from what I can work out MATLA
uses the Prime Factor algorithm to do this but I am having trouble workin
out how many MACS are done. Does matlab have the ability to feed back thi
information?
>I am doing a project in matlab to compare the computational complexitie
of
>two systems the problem I have is that I need to work out how man
multiply
>and adds a 150 point ifft uses in matlab from what I can work out MATLAB
>uses the Prime Factor algorithm to do this but I am having troubl
working
>out how many MACS are done. Does matlab have the ability to feed bac
this
>information?
>
>Thanks
>
>Pikey
>
Well, MATLAB used to allow you to calculate the number of flops (floatin
point operations), but as the help on flops indicates, since the inclusio
of LAPACK it is no longer practical. You can time the two functions usin
tic (before the function call) and toc (after the function call) an
compare times, which should give you an approximation of relativ
complexity assuming the arrays have the same memory structure.
>>I am doing a project in matlab to compare the computational complexities
>of
>>two systems the problem I have is that I need to work out how many
>multiply
>>and adds a 150 point ifft uses in matlab from what I can work ou
MATLAB
>>uses the Prime Factor algorithm to do this but I am having trouble
>working
>>out how many MACS are done. Does matlab have the ability to feed back
>this
>>information?
>>
>>Thanks
>>
>>Pikey
>>
>
>Well, MATLAB used to allow you to calculate the number of flop
(floating
>point operations), but as the help on flops indicates, since th
inclusion
>of LAPACK it is no longer practical. You can time the two function
using
>tic (before the function call) and toc (after the function call) and
>compare times, which should give you an approximation of relative
>complexity assuming the arrays have the same memory structure.
>
>Mark
Agree the tic toc is useful but I am doing one method by hand and th
other practically on matlab so I can't use it.
>Agree the tic toc is useful but I am doing one method by hand and the
>other practically on matlab so I can't use it.
Well, you can time your functions then use an approximation to calculat
the equivalent FLOP count. FFTW uses MFLOPS = 5*N*log2(N)/time (in us) fo
complex data, divide by 2 for real data. You can force MATLAB to use FFTW
though I do not know if there are any timing differences between FFT
within MATLAB, or whether the command-line executable for FFTW is faster o
slower.
>>Agree the tic toc is useful but I am doing one method by hand and the
>>other practically on matlab so I can't use it.
>
>Well, you can time your functions then use an approximation to calculate
>the equivalent FLOP count. FFTW uses MFLOPS = 5*N*log2(N)/time (in us
for
>complex data, divide by 2 for real data. You can force MATLAB to us
FFTW,
>though I do not know if there are any timing differences between FFTs
>within MATLAB, or whether the command-line executable for FFTW is faste
or
>slower.
>
>Mark
>
thats ok for an approximation I think I might ignore the matlab part an
strive to find out what the least number of MACS for a 150 point fft usin
complex data which will solve my problem(s) If any one know the answer
would appreciate it cheers
Pikey
>thats ok for an approximation I think I might ignore the matlab part and
>strive to find out what the least number of MACS for a 150 point ff
using
>complex data which will solve my problem(s) If any one know the answer I
>would appreciate it cheers
>Pikey
5*N*log2(N) is the asymptotic number of FLOPS for a radix-2 Cooley-Tuke
algorithm (approaches this value for large N). Different algorithms wil
perform (speed) differently, sometimes using more FLOPS but at faste
speeds. This is usually due to memory architecture and the number o
reads/writes that your algorithm requires.
There are many ways to do a 150 point FFT, so there's no single answer.
= 150 decomposes into 5*5*3*2, and you can configure your FFT in just abou
any combination of these primes, each probably has a different number o
FLOPS.
On Apr 29, 1:53 pm, "pikey" <pi...@amor888.fsnet.co.uk> wrote:
> I am doing a project in matlab to compare the computational complexities of
> two systems the problem I have is that I need to work out how many multiply
> and adds a 150 point ifft uses in matlab from what I can work out MATLAB
> uses the Prime Factor algorithm to do this but I am having trouble working
> out how many MACS are done. Does matlab have the ability to feed back this
> information?
You don't need to try to reverse-engineer what algorithm Matlab's FFT
is using; you can just download the source code from fftw.org (or just
read the papers on the web site, which describe the algorithms used).
Actually, "doc fft" within Matlab has some information on the
algorithms too.
Short answer: FFTW uses the prime-factor algorithm for small sizes (<=
16) within its codelets at the leaves of its recursion. Larger
composite sizes are broken down into smaller sizes using mixed-radix
Cooley-Tukey. It is O(N log N) for all N.
If you call FFTW from C, there is a function fftw_flops to get an
count of arithmetic operations if you really need that for some
reason. (Some other posters have suggested timing the FFT, but that's
pretty useless except for very crude estimates of flop counts.)
Running it on my machine, FFTW 3.1.2 reports that it requires 3100
additions and 1480 multiplications for an N=150 complex-data FFT.
If you really care about N=150, you can generate a hard-coded routine
using FFTW's code generator. For complex data it generates a routine
with 3012 additions and 1304 multiplications. (This is not optimal,
although it should be reasonably close. If I remember correctly, it
turned out that the algorithm with the lowest-known arithmetic count
for size 5 actually performed worse in practice than an algorithm with
more arithmetic, as the latter pipelined better last we checked.)
On May 2, 1:27 pm, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:
> If you really care about N=150, you can generate a hard-coded routine
> using FFTW's code generator. For complex data it generates a routine
> with 3012 additions and 1304 multiplications. (This is not optimal,
> although it should be reasonably close. If I remember correctly, it
> turned out that the algorithm with the lowest-known arithmetic count
> for size 5 actually performed worse in practice than an algorithm with
> more arithmetic, as the latter pipelined better last we checked.)
Actually, I take that back; the difference was for sizes 7 and 11.
For sizes 3 and 5 our kernels match the lowest known flop count, as
far as I can tell (although there is no proof that any of these
algorithms are optimal).
>(although there is no proof that any of these
>algorithms are optimal).
Yeah, as you note, even "optimal" in terms of the least amount of FLOPS i
not necessarily "optimal" in terms of speed. I had a professor once tha
used to get quite ticked every time he heard somebody mention "optimal
without reference to some criteria. Set your minimum FLOP count algorith
up in such a way that your next instruction is always waiting fo
completion of the current instruction (a pipeline stall) and you'll find i
may not work very well compared to another algorithm with more require
FLOPS, but a better overall flow.
> Short answer: FFTW uses the prime-factor algorithm for small sizes (<=
> 16) within its codelets at the leaves of its recursion. Larger
> composite sizes are broken down into smaller sizes using mixed-radix
> Cooley-Tukey. It is O(N log N) for all N.
An interesting statement. O() notation gives the order as
N goes to infinity, and it may or may not be of any use
for small N. Now, is infinity prime or composite?
Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
>An interesting statement. O() notation gives the order as
>N goes to infinity, and it may or may not be of any use
>for small N. Now, is infinity prime or composite?
I was reading this thinking "yeah, it's probably not very good for smal
N" then I got to the last bit and had to laugh.
On May 6, 3:12 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> > Cooley-Tukey. It is O(N log N) for all N.
>
> An interesting statement. O() notation gives the order as
> N goes to infinity, and it may or may not be of any use
> for small N. Now, is infinity prime or composite?
If you look at the definition of asymptotic (O) notation, you'll see
that there's no need to say whether "infinity" is prime or composite.
In practice, the O(N log N) prime-size algorithms already have clear
advantage for N > 100. In principle, you usually have enough freedom
to choose your transform to be a composite size, but many people
apparently find it useful to not have to worry about resizing their
data, without being afraid that if they are unlucky the FFT will be
thousands of times slower (as opposed to just a factor of 10).
On May 5, 12:02 pm, "markt" <tak...@pericle.com> wrote:
> Yeah, as you note, even "optimal" in terms of the least amount of FLOPS is
> not necessarily "optimal" in terms of speed.
And even optimality in terms of flop counts is not proven; no one
knows tight lower bounds. There isn't even any proof that you can't
do better than O(N log N) for an FFT (although no counterexamples are
known).
> without reference to some criteria. Set your minimum FLOP count algorithm
> up in such a way that your next instruction is always waiting for
> completion of the current instruction (a pipeline stall) and you'll find it
> may not work very well compared to another algorithm with more required
> FLOPS, but a better overall flow.
Yes, especially for large N I'm not aware of any implementation of the
minimal-known-flop FFT algorithm(s) that is competitive in practice
with highly optimized implementations. You just have too many other
things to worry about that are more important than arithmetic these
days. On the other hand, to be fair, I doubt that the highly
optimized implementations sacrifice more than 10-15% in the flop count
compared to the lowest known flop counts.
On the other hand, for small hardcoded transforms of sizes < 100 or
so, we have usually found that we do best with the algorithms that
come close to the minimum known flop counts. (This relies on the
ability to find a good schedule for the code, of course, but we have a
generic algorithm for this that seems to work well.)
(Sometimes people talk about a "crossover point" at which the O(N log
N) algorithms beat the naive O(N^2) algorithm, but in my experience
this is mostly about overheads of implementations. For small N you
really want to hard-code everything to avoid the overhead, and in that
case, we've usually found that a hard-coded O(N log N) algorithm wins
over the O(N^2) algorithm for all composite N starting with N=4.)
"glen herrmannsfeldt" <[email protected]> wrote in message
news:[email protected] ..
> Steven G. Johnson wrote:
> (snip)
>
>> Short answer: FFTW uses the prime-factor algorithm for small sizes (<=
>> 16) within its codelets at the leaves of its recursion. Larger
>> composite sizes are broken down into smaller sizes using mixed-radix
>> Cooley-Tukey. It is O(N log N) for all N.
>
> An interesting statement. O() notation gives the order as
> N goes to infinity, and it may or may not be of any use
> for small N. Now, is infinity prime or composite?
glen herrmannsfeldt wrote:
> Steven G. Johnson wrote:
> (snip)
>
>> Short answer: FFTW uses the prime-factor algorithm for small sizes (<=
>> 16) within its codelets at the leaves of its recursion. Larger
>> composite sizes are broken down into smaller sizes using mixed-radix
>> Cooley-Tukey. It is O(N log N) for all N.
>
> An interesting statement. O() notation gives the order as
> N goes to infinity, and it may or may not be of any use
> for small N. Now, is infinity prime or composite?
Composite, of course. Primes get scarcer as numbers increase. By the
time infinity is reached, they are vanishingly scarce. Aint logic wunnerful?
Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
>If you look at the definition of asymptotic (O) notation, you'll see
>that there's no need to say whether "infinity" is prime or composite.
I think he was intentionally being funny.
>In practice, the O(N log N) prime-size algorithms already have clear
>advantage for N > 100. In principle, you usually have enough freedom
>to choose your transform to be a composite size, but many people
>apparently find it useful to not have to worry about resizing their
>data, without being afraid that if they are unlucky the FFT will be
>thousands of times slower (as opposed to just a factor of 10).
I played around with FFTW quite a bit a few years ago (you and I trade
emails about my efforts, and my company actually bought a license at th
time) and I can say that this is at the very least, empirically true.
>And even optimality in terms of flop counts is not proven; no one
>knows tight lower bounds. There isn't even any proof that you can't
>do better than O(N log N) for an FFT (although no counterexamples are
>known).
Though I suppose the fact that O(N log N) is called an estimation implie
this statement, I did not know this explicitly. Particularly the secon
statement. Learn something new every day. Do you suppose it is a
unprovable problem?
>Yes, especially for large N I'm not aware of any implementation of the
>minimal-known-flop FFT algorithm(s) that is competitive in practice
>with highly optimized implementations. You just have too many other
>things to worry about that are more important than arithmetic these
>days. On the other hand, to be fair, I doubt that the highly
>optimized implementations sacrifice more than 10-15% in the flop count
>compared to the lowest known flop counts.
Too bad. It's not difficult for other math functions, such as a comple
multiply or magnitude function (which is what FFTW's SIMD code does), whic
are then ordered in memory for fast access and a consistently ful
pipeline. Once you have to stride or do a corner turn, the effort jus
ain't worth it, not for the 10% at least.
On May 7, 12:05 pm, "markt" <tak...@pericle.com> wrote:
> >And even optimality in terms of flop counts is not proven; no one
> >knows tight lower bounds. There isn't even any proof that you can't
> >do better than O(N log N) for an FFT (although no counterexamples are
> >known).
>
> Though I suppose the fact that O(N log N) is called an estimation implies
> this statement, I did not know this explicitly. Particularly the second
> statement. Learn something new every day. Do you suppose it is an
> unprovable problem?
I wouldn't say that the O(N log N) asymptotic analysis implies this
statement. Having an estimate or an asymptotic bound doesn't mean
that you don't know more precise things as well. Exact arithmetic
counts are certainly known for the existing algorithms; it's just not
known whether better counts are possible.
I don't see any reason to assume it's an unprovable axiom-of-choice
type of thing, although the proof seems to be fairly difficult and has
resisted repeated attempts so far. There are already proofs that
Omega(N log N) operations are required under certain restrictions on
the type of algorithms [e.g. there is a nice proof by Morgenstern in
1978 that Omega(N log N) additions are required if you restrict
yourself to algorithms with bounded multiplicative constants ...
unfortunately, there are many interesting algorithms that do not have
the property of bounded constants, including the ones with the
currently lowest known flop counts (see fftw.org/newsplit.pdf)].
On May 9, 12:23 am, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:
> the type of algorithms [e.g. there is a nice proof by Morgenstern in
> 1978 that Omega(N log N) additions are required if you restrict
>I don't see any reason to assume it's an unprovable axiom-of-choice
>type of thing, although the proof seems to be fairly difficult and has
>resisted repeated attempts so far. There are already proofs that
>Omega(N log N) operations are required under certain restrictions on
>the type of algorithms [e.g. there is a nice proof by Morgenstern in
>1978 that Omega(N log N) additions are required if you restrict
>yourself to algorithms with bounded multiplicative constants ...
>unfortunately, there are many interesting algorithms that do not have
>the property of bounded constants, including the ones with the
>currently lowest known flop counts (see fftw.org/newsplit.pdf)].
Interesting. I wasn't particularly thinking it was unprovable, jus
curious whether or not it can be proven unprovable. Rumor has it m
advanced calculus professor (oh so long ago) ended up with a know
unprovable problem for his dissertation (though he did not know this at th
time), which always makes me think about these things. He was, btw, a bi
odd, and the rumor claimed it was a result of his dissertation stress.