FFT for an arbitrary number of points (not power of 2)

Hi,

I'd like to ask experts here about ideas to perform a FFT on an
arbitrary number of points (for real data).

The cores usually found for an FPGA implementation only permit FFTs on
a number of points that is a power of 2. In our particular case
however, we need to be able to do an FFT on a vector of say, 1025
points. Our current algorithm is to zero-pad this vector to 2048
points.

The problem with this approach is that we nearly double the number of
points for downstream processing. This is very problematic at high
datarates.

We have a few ideas but since this seems like a common problem, I'd
like to ask people here for tips.

Re: FFT for an arbitrary number of points (not power of 2)

Patrick Dubois wrote:
> Hi,
>
> I'd like to ask experts here about ideas to perform a FFT on an
> arbitrary number of points (for real data).
>
> The cores usually found for an FPGA implementation only permit FFTs on
> a number of points that is a power of 2. In our particular case
> however, we need to be able to do an FFT on a vector of say, 1025
> points. Our current algorithm is to zero-pad this vector to 2048
> points.

A fast fourier transmformation has to be done in samples of 2**n. This
significantly simplifies the amount of mathematical processing to be done.

A discrete fourier transmform can be of any size that you want. The
mathematical processing is a lot more complicated. This will take a lot
of gates.

\why do you need to sample and process different amounts of data?
>
> The problem with this approach is that we nearly double the number of
> points for downstream processing. This is very problematic at high
> datarates.
>
> We have a few ideas but since this seems like a common problem, I'd
> like to ask people here for tips.
>
> Thanks!
>
> Patrick
>

Re: FFT for an arbitrary number of points (not power of 2)

If you only need the results at a few frequency points use the "Goertze
algorithm".

There are versions of FFT defined for non-power-of-2. These are generall
known as "prime factor" algorithms. The Winograd transform is related t
these, but all are complicated in control terms, and so are difficult t
programm on DSP microprocessors, let alone FPGAs.

The power-of-2 FFTs are the most widely used for very good reasons...

Re: FFT for an arbitrary number of points (not power of 2)

On Oct 30, 7:02 am, Andy Botterill <[email protected]> wrote:
>
> A fast fourier transmformation has to be done in samples of 2**n. This
> significantly simplifies the amount of mathematical processing to be done.
>
> A discrete fourier transmform can be of any size that you want. The
> mathematical processing is a lot more complicated. This will take a lot
> of gates.
>
> \why do you need to sample and process different amounts of data?

Our instrument is a FTS (Fourier Transform Spectrometer). The number
of time-domain samples is directly related to the resolution of the
instrument. Since we want to give flexibility to the customer, we need
to be able to permit a wide range of samples, say 64 to 32k. Our FFT
module therefore needs to be flexible.

We could limit the number of samples to power of 2s, but this is not
very practical. There is quite a large gap between 16k and 32k points.

Basically I'm looking for a solution that will let me use the Xilinx
FFT core that has a power of 2s limitation, to do FFTs on an arbitrary
number of points.

One idea is the chirp z-transform algorithm, but I'm not sure yet if
it would be suitable for a FPGA implementation.

Re: FFT for an arbitrary number of points (not power of 2)

Patrick Dubois wrote:
> Hi,
>
> I'd like to ask experts here about ideas to perform a FFT on an
> arbitrary number of points (for real data).
>
> The cores usually found for an FPGA implementation only permit FFTs on
> a number of points that is a power of 2. In our particular case
> however, we need to be able to do an FFT on a vector of say, 1025
> points. Our current algorithm is to zero-pad this vector to 2048
> points.
>
> The problem with this approach is that we nearly double the number of
> points for downstream processing. This is very problematic at high
> datarates.
>
> We have a few ideas but since this seems like a common problem, I'd
> like to ask people here for tips.
>
> Thanks!
>
> Patrick

I'd suggest that an arbitrary number - selected at runtime rather than
by design - isn't practical but designing for 1025 points might be.

I forget if it was LeCroy or Tektronix, but about a decade ago I read
the whitepaper on their scope's FFT capability. Scopes also have the
"limitation" of non-2^n size samples. Rather than decompose their FFT
into 2-element butterflies, they decomposed their 2/5/10 multiple
waveform samples into 2-element and 5-element FFT nodules. If the idea
behind the FFT is to reuse the sin/cos values for a given phase delta,
decomposing into a non-2^n system can work well. What obviously won't
work well for this approach is any size with large prime numbers.

Your 1025 point example calls for 5x5x41 which wouldn't pring so much in
acceleration.

Sorry I don't have an url for the whitepaper - it's been a decade.

Re: FFT for an arbitrary number of points (not power of 2)

On 30 oct, 09:23, John_H <[email protected]> wrote:
> I'd suggest that an arbitrary number - selected at runtime rather than
> by design - isn't practical but designing for 1025 points might be.
>
> I forget if it was LeCroy or Tektronix, but about a decade ago I read
> the whitepaper on their scope's FFT capability. Scopes also have the
> "limitation" of non-2^n size samples. Rather than decompose their FFT
> into 2-element butterflies, they decomposed their 2/5/10 multiple
> waveform samples into 2-element and 5-element FFT nodules. If the idea
> behind the FFT is to reuse the sin/cos values for a given phase delta,
> decomposing into a non-2^n system can work well. What obviously won't
> work well for this approach is any size with large prime numbers.
>
> Your 1025 point example calls for 5x5x41 which wouldn't pring so much in
> acceleration.
>
> Sorry I don't have an url for the whitepaper - it's been a decade.
>
> - John_H

Thanks for the idea. But ideally I'd like to limit myself to power of
2 FFTs so that I can use the Xilinx core. The number of points needs
to be flexible from 64 to 32k. Decomposition into smaller chunks seems
like a promising avenue. Although if one number is decomposible into
two power of 2s (e.g. 8192, which is divisable by 64 and 128), then
that number itself will be a power of 2.

I'd be quite happy with an algorithm that increases the number of
points possible (compared to only power of 2s). Being able to do
_every_ possible number of points is probably too strict a
requirement.

Re: FFT for an arbitrary number of points (not power of 2)

On Oct 30, 12:23 am, Patrick Dubois <[email protected]> wrote:
> Hi,
>
> I'd like to ask experts here about ideas to perform a FFT on an
> arbitrary number of points (for real data).
>
> The cores usually found for an FPGA implementation only permit FFTs on
> a number of points that is a power of 2. In our particular case
> however, we need to be able to do an FFT on a vector of say, 1025
> points. Our current algorithm is to zero-pad this vector to 2048
> points.
>
> The problem with this approach is that we nearly double the number of
> points for downstream processing. This is very problematic at high
> datarates.
>
> We have a few ideas but since this seems like a common problem, I'd
> like to ask people here for tips.
>
> Thanks!
>
> Patrick

Hello Patrick,

The FFT algorithm is efficient because it uses a divide and conquer
approach to process the data in subsets. The data is processed by
multiple "butterfly" processing kernels. Power of two data sets are
easy to deal with, because you can just use power of two butterflies
and the addressing is regular.

You can still use the FFT algorithm on non power of two data sets, but
if you do not want to pad the data up to the next power of two, you
will need non power of two butterflies.

For your case, 1025 factors to 5*5*41 so you would need to have
radix-5 and radix-41 butterflies if you really want to have a 1025
point FFT. If you just want to not have to pad all the way to 2048 but
don't care about padding a bit you could find a nicer size. For
example 1152 would be 2^7 * 3^2 and would only need radix-2 and
radix-3 butterflies.

If you want an FFT that is not a power of two in size, you will need a
non power of two butterfly in the mix.

Re: FFT for an arbitrary number of points (not power of 2)

On Oct 30, 6:38 am, Patrick Dubois <[email protected]> wrote:
> On 30 oct, 09:23, John_H <[email protected]> wrote:
>
>
>
> > I'd suggest that an arbitrary number - selected at runtime rather than
> > by design - isn't practical but designing for 1025 points might be.
>
> > I forget if it was LeCroy or Tektronix, but about a decade ago I read
> > the whitepaper on their scope's FFT capability. Scopes also have the
> > "limitation" of non-2^n size samples. Rather than decompose their FFT
> > into 2-element butterflies, they decomposed their 2/5/10 multiple
> > waveform samples into 2-element and 5-element FFT nodules. If the idea
> > behind the FFT is to reuse the sin/cos values for a given phase delta,
> > decomposing into a non-2^n system can work well. What obviously won't
> > work well for this approach is any size with large prime numbers.
>
> > Your 1025 point example calls for 5x5x41 which wouldn't pring so much in
> > acceleration.
>
> > Sorry I don't have an url for the whitepaper - it's been a decade.
>
> > - John_H
>
> Thanks for the idea. But ideally I'd like to limit myself to power of
> 2 FFTs so that I can use the Xilinx core. The number of points needs
> to be flexible from 64 to 32k. Decomposition into smaller chunks seems
> like a promising avenue. Although if one number is decomposible into
> two power of 2s (e.g. 8192, which is divisable by 64 and 128), then
> that number itself will be a power of 2.
>
> I'd be quite happy with an algorithm that increases the number of
> points possible (compared to only power of 2s). Being able to do
> _every_ possible number of points is probably too strict a
> requirement.
>
> Patrick

Since you have used zero-filling, you don't seem to need a transform
the exact size of the data. There is another technique complementary
to zero-filling, sometimes called data folding. Since the DFT is a
circular convolution, when your data set is larger than the transform
size you can continue to 'wrap' the data in a circular manner and sum
the overlapped samples. You then perform the smaller sized FFT. You
might consider data folding up to sqrt 2 times the smaller power of
two FFT size and zero-filling up to the larger transform size above
that.

Chapter 8 (by fred harris) of 'Handbook of Digital Signal Processing
Engineering Applications' edited by Douglas Eliot discusses this
approach.

With either zero-fill or data folding it is useful to remember that
the window size that determines bin response shapes is the data size
not the transform size.

> Since you have used zero-filling, you don't seem to need a transform
> the exact size of the data.

Correct. Although I want to minimize data growth as much as possible
(which is the drawback of zero-filling).

> There is another technique complementary
> to zero-filling, sometimes called data folding. Since the DFT is a
> circular convolution, when your data set is larger than the transform
> size you can continue to 'wrap' the data in a circular manner and sum
> the overlapped samples. You then perform the smaller sized FFT. You
> might consider data folding up to sqrt 2 times the smaller power of
> two FFT size and zero-filling up to the larger transform size above
> that.
>
> Chapter 8 (by fred harris) of 'Handbook of Digital Signal Processing
> Engineering Applications' edited by Douglas Eliot discusses this
> approach.

Interesting. Thanks for the reference, I'll try to find a copy of the
book.

Re: FFT for an arbitrary number of points (not power of 2)

> I'd like to ask experts here about ideas to perform a FFT on an
> arbitrary number of points (for real data).

Hi Patrick,
A popular misconception seems to be that in order to use cooley-turkey
FFT the number of points must be a power of 2. Digital Signal
Processing with Field Programmable Gate Arrays by Uwe Meyer-Baese, for
example, has an example we he shows the Cooley-Tukey FFT for N = 12.
He claims 674 real additions and 432 real multiplications for a 12
point DFT, and for the 12 point FFT 108 real additions and 28 real
multiplications. The signal flow graph he shows, shows in the 1st
stage, 3x four point DFTs, followed by twiddle factors, and then a
final stage of 4x three point DFT's.

He also goes on to present the Good-Thomas FFT algorithm and Winograd
FFT. Again he demonstrates for N=12.

Re: FFT for an arbitrary number of points (not power of 2)

Patrick Dubois wrote:

> Hi,
>
> I'd like to ask experts here about ideas to perform a FFT on an
> arbitrary number of points (for real data).
>
> The cores usually found for an FPGA implementation only permit FFTs on
> a number of points that is a power of 2. In our particular case
> however, we need to be able to do an FFT on a vector of say, 1025
> points. Our current algorithm is to zero-pad this vector to 2048
> points.
>
> The problem with this approach is that we nearly double the number of
> points for downstream processing. This is very problematic at high
> datarates.
>
> We have a few ideas but since this seems like a common problem, I'd
> like to ask people here for tips.
>
> Thanks!
>
> Patrick
>

As others have mentioned, you need non-power of two FFT kernels to get
non-power of two transform sizes unless you use zero padding. A
possibility would be to have a set of non-power of two kernels you could
combine using the mixed radix algorithm to get larger FFTs. If your FFT
kernel sizes are relatively prime, you get a significant simplification
because it doesn't need phase rotations between the FFT kernels. For
example, a 1540 point FFT can be had by combining 4, 5, 7 and 11 point
kernels with no intervening rotations. If you use rotations between the
FFT kernels, you can mix the kernels without regard to whether or not
they are relatively prime. There are Winograd kernels for all these
sizes, and others (Rader or Singleton as I recall) for 3 and 5 point.
These are actually quite a bit easier to do in hardware than they are
with a microprocessor, as the reordering is a bit convoluted. You might
look at the Smith & Smith book on FFT's
(http://www.amazon.com/exec/obidos/AS...310918/andraka), as they
have a very good coverage of non-Cooley-Tukey FFTs. Unfortunately,
arbitrary sizes are a little harder to do if it is to be flexible
because the addressing and phase rotations are not as well ordered as
they are for power of 2 sizes.

Re: FFT for an arbitrary number of points (not power of 2)

Thank you for the last three responses from John, Andrew and Ray (and
everyone else who responded). You pretty much convinced me of the need
to have different size kernels to be able to handle a larger range of
points possibility. Unfortunately that means that I won't be able to
use Xilinx Coregen FFT. I guess we'll stick with zero-padding for the
time being but at least now I know my options if I really wanna go
that route.

I'm still intriged by the chirp z-transform (Bluestein's FFT
algorithm) however, because it seems to use power of 2 FFTs (at least
the Matlab implementation of czt does). I don't quite yet grasp the
algorithm though. I see that it is discussed in chapter 9.5 of the
book Ray Andraka suggested. I'll try to get a copy.