PDA

View Full Version : Naive FIR filter question


Rick Lyons
12-12-2003, 02:58 PM
Hi Guys,

I have a simple (I think) filtering question.
I've programmed (years ago) microcontrollers,
but I've never programmed any of the modern DSP
chips with their "zero-overhead looping" and
single-cycle multiply-and-accumulate features.
(I know, I know. I should get a DSP starter kit!)
So the DSP chips are efficient for implementing
FIR filtering.

Am I correct in assuming that programmable DSP
chips *cannot* take advantage of the "folded FIR
structure" that's possible when a FIR filter's
coefficients are symmetrical. (You remember that
"folded FIR" idea, ... where an M-tap FIR filter
can be implemented with roughly M/2 multiplies
per output sample.)

I've searched around on the Internet, but haven't
found a solid answer to my question.

Thanks,
[-Rick-]

Jim Thomas
12-12-2003, 04:18 PM
Rick Lyons wrote:
> Hi Guys,
[deletia]
> Am I correct in assuming that programmable DSP
> chips *cannot* take advantage of the "folded FIR
> structure" that's possible when a FIR filter's
> coefficients are symmetrical. (You remember that
> "folded FIR" idea, ... where an M-tap FIR filter
> can be implemented with roughly M/2 multiplies
> per output sample.)

Hi Rick,

The folded structure still requires M-1 adds, so there are still M-1
cycles requiring arithmetic. Since DSPs can do the multiplies in
parallel with the adds (and in a single cycle), there's no advantage to
eliminating half the multiplies.

I don't think execution would be any slower with the folded structure,
but it's more difficult to write, and more difficult to read. Well -
maybe it wouldn't be as fast if you count the more complicated loop
setup. Perhaps the setup penalty could be eliminated too, but it would
almost certainly cause brain damage to anyone embarking on such an
adventure.

--
Jim Thomas Principal Applications Engineer Bittware, Inc
[email protected] http://www.bittware.com (703) 779-7770
The secret to enjoying your job is to have a hobby that's even worse
- Calvin's Dad

Jerry Avins
12-12-2003, 04:19 PM
Rick Lyons wrote:

> Hi Guys,
>
> I have a simple (I think) filtering question.
> I've programmed (years ago) microcontrollers,
> but I've never programmed any of the modern DSP
> chips with their "zero-overhead looping" and
> single-cycle multiply-and-accumulate features.
> (I know, I know. I should get a DSP starter kit!)
> So the DSP chips are efficient for implementing
> FIR filtering.
>
> Am I correct in assuming that programmable DSP
> chips *cannot* take advantage of the "folded FIR
> structure" that's possible when a FIR filter's
> coefficients are symmetrical. (You remember that
> "folded FIR" idea, ... where an M-tap FIR filter
> can be implemented with roughly M/2 multiplies
> per output sample.)
>
> I've searched around on the Internet, but haven't
> found a solid answer to my question.
>
> Thanks,
> [-Rick-]

You are correct, as usual. Folding amounts to adding pairs of
coefficients before multiplying by a filter coefficient. It entails a
little extra bookkeeping, but that's often trivial compared to replacing
a multiply with an add. When multiplication and addition take the same
time, the bookkeeping is a dead loss.

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

Jon Harris
12-12-2003, 05:55 PM
"Jerry Avins" <[email protected]> wrote in message
news:[email protected]...
> Rick Lyons wrote:
>
> > Hi Guys,
> >
> > I have a simple (I think) filtering question.
> > I've programmed (years ago) microcontrollers,
> > but I've never programmed any of the modern DSP
> > chips with their "zero-overhead looping" and
> > single-cycle multiply-and-accumulate features.
> > (I know, I know. I should get a DSP starter kit!)
> > So the DSP chips are efficient for implementing
> > FIR filtering.
> >
> > Am I correct in assuming that programmable DSP
> > chips *cannot* take advantage of the "folded FIR
> > structure" that's possible when a FIR filter's
> > coefficients are symmetrical. (You remember that
> > "folded FIR" idea, ... where an M-tap FIR filter
> > can be implemented with roughly M/2 multiplies
> > per output sample.)
> >
> > I've searched around on the Internet, but haven't
> > found a solid answer to my question.
> >
> > Thanks,
> > [-Rick-]
>
> You are correct, as usual. Folding amounts to adding pairs of
> coefficients before multiplying by a filter coefficient. It entails a
> little extra bookkeeping, but that's often trivial compared to replacing
> a multiply with an add. When multiplication and addition take the same
> time, the bookkeeping is a dead loss.

"When multiplication and addition take the same time" which is almost always
the case in modern DSP chips with their dedicated hardware multiplier.

Jerry Avins
12-12-2003, 06:36 PM
Jon Harris wrote:

> "Jerry Avins" <[email protected]> wrote in message
> news:[email protected]...
>
>>Rick Lyons wrote:

...

>>>Am I correct in assuming that programmable DSP
>>>chips *cannot* take advantage of the "folded FIR
>>>structure" that's possible when a FIR filter's
>>>coefficients are symmetrical. ...

>>You are correct, as usual. ...

>> ... When multiplication and addition take the same
>>time, the bookkeeping is a dead loss.
>
>
> "When multiplication and addition take the same time" which is almost always
> the case in modern DSP chips with their dedicated hardware multiplier.

That's why I wrote that he was correct. :-)

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

Keith Larson
12-12-2003, 07:52 PM
Hi all

You might want to look at the C54xx FIRS instruction. It has the extra
adder needed to impliment the following in 1 cycle

Acc = Acc + (*coef++ * (*mema++ + *memb++));

The data and coefficients do however need to be correctly placed in
memory such that 3 reads can occur per cycle :-)

Best regards,
Keith Larson
=============================
Jim Thomas wrote:
> Rick Lyons wrote:
>
>> Hi Guys,
>
> [deletia]
>
>> Am I correct in assuming that programmable DSP chips *cannot* take
>> advantage of the "folded FIR structure" that's possible when a FIR
>> filter's coefficients are symmetrical. (You remember that "folded
>> FIR" idea, ... where an M-tap FIR filter can be implemented with
>> roughly M/2 multiplies per output sample.)
>
>
> Hi Rick,
>
> The folded structure still requires M-1 adds, so there are still M-1
> cycles requiring arithmetic. Since DSPs can do the multiplies in
> parallel with the adds (and in a single cycle), there's no advantage to
> eliminating half the multiplies.
>
> I don't think execution would be any slower with the folded structure,
> but it's more difficult to write, and more difficult to read. Well -
> maybe it wouldn't be as fast if you count the more complicated loop
> setup. Perhaps the setup penalty could be eliminated too, but it would
> almost certainly cause brain damage to anyone embarking on such an
> adventure.
+------------------------------------------+
|Keith Larson |
|Member Group Technical Staff |
|Texas Instruments Incorporated |
| |
| 281-274-3288 |
| [email protected] |
|------------------------------------------+
| TMS320C3x/C4x/VC33 Applications |
| |
| $150 TMS320VC33 DSK's ARE AVAILABLE NOW |
| |
| TMS320VC33 |
| The lowest cost and lowest power |
| floating point DSP on the planet! |
| 500uw/Mflop |
+------------------------------------------+

Jerry Avins
12-12-2003, 09:52 PM
Keith Larson wrote:

> Hi all
>
> You might want to look at the C54xx FIRS instruction. It has the extra
> adder needed to impliment the following in 1 cycle
>
> Acc = Acc + (*coef++ * (*mema++ + *memb++));
>
> The data and coefficients do however need to be correctly placed in
> memory such that 3 reads can occur per cycle :-)
>
> Best regards,
> Keith Larson
> =============================

Oh, Wow! Now if we can get the second addend to be a subtrahend instead,
We can fold Hilbert Transformers, too!

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

Al Clark
12-13-2003, 02:40 PM
Jim Thomas <[email protected]> wrote in news:vtjqv631438883
@corp.supernews.com:

> Rick Lyons wrote:
>> Hi Guys,
> [deletia]
>> Am I correct in assuming that programmable DSP
>> chips *cannot* take advantage of the "folded FIR
>> structure" that's possible when a FIR filter's
>> coefficients are symmetrical. (You remember that
>> "folded FIR" idea, ... where an M-tap FIR filter
>> can be implemented with roughly M/2 multiplies
>> per output sample.)
>
> Hi Rick,
>
> The folded structure still requires M-1 adds, so there are still M-1
> cycles requiring arithmetic. Since DSPs can do the multiplies in
> parallel with the adds (and in a single cycle), there's no advantage to
> eliminating half the multiplies.
>
> I don't think execution would be any slower with the folded structure,
> but it's more difficult to write, and more difficult to read. Well -
> maybe it wouldn't be as fast if you count the more complicated loop
> setup. Perhaps the setup penalty could be eliminated too, but it would
> almost certainly cause brain damage to anyone embarking on such an
> adventure.
>

I use a bit of a hybrid to this approach. Since FIRs are often
symmetrical, I have a function that uses a coefficient file of length M/2
+1 (odd length M). I still do MACs on each element of the delay line. It
takes just about the same number of cycles to execute but in cases where
memory is in short supply and there are lots of filters, it saves memory.

--
Al Clark
Danville Signal Processing, Inc.
--------------------------------------------------------------------
Purveyors of Fine DSP Hardware and other Cool Stuff
Available at http://www.danvillesignal.com