FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > DSP

DSP comp.dsp newsgroup, mailing list

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 03-03-2005, 04:45 PM
Daniel
Guest
 
Posts: n/a
Default Porting LMS from floating-point to fixed-point processor

Hello Everybody,

for my diploma thesis, I have to implement a Least-Mean-Square
Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a
floating-point processor(TI 6713) earlier, so I just to the code and
copied it. Of course, there are a lot of float variables in the code.
When I ran the program, it workes for small FIR orders (6), but the
larger the order of the filter, the worse the result.
Is this because the 6416 cannot work with floating-point numbers
accuretly?
What can I do? When I convert all the float variables to integers I
get overflow problems.

Thanks a lot
Daniel
Reply With Quote
  #2 (permalink)  
Old 03-03-2005, 05:39 PM
Tim Wescott
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Daniel wrote:
> Hello Everybody,
>
> for my diploma thesis, I have to implement a Least-Mean-Square
> Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a
> floating-point processor(TI 6713) earlier, so I just to the code and
> copied it. Of course, there are a lot of float variables in the code.
> When I ran the program, it workes for small FIR orders (6), but the
> larger the order of the filter, the worse the result.
> Is this because the 6416 cannot work with floating-point numbers
> accuretly?
> What can I do? When I convert all the float variables to integers I
> get overflow problems.
>

I've never had to think about implementing a LMS algorithm in
fixed-point, so here's some general comments:

The way to work with fixed-point arithmetic is to determine the ranges
of your data and your coefficients, then jigger things around to keep
the results (final _and_ intermediate) in range. Everything else flows
from that.

Generally if you stick to pure C you are stuck with integer math. DSP's
are designed to do fixed-radix math pretty quickly, and TI has a library
to deal with that (can't remember the name -- someone jump in here and
name it!). At it's most basic a DSP will do a vector dot product of a
bunch of integers with a fixed shift -- this is usually enough to get
you anything you need, if you're clever enough.

Look at how you can arrange things to use a fractional vector multiply.
This is what DSP's are built for (look for the MAC instruction), so it
should at least be possible. I've been consistently disappointed by the
libraries that come with DSP tools, but perhaps the 6416 is different --
and if not there's always assembly. IIRC the LMS algorithms that I've
seen reduce to finding a vector dot product or three and finding one
scalar reciprocal. All these things are possible to do _fast_ with a
fixed-point DSP.

Don't be afraid to use 32-bit data paths -- even a 16-bit DSP should be
made so that you can do 4 16-bit vector dot products and combine them
into one 32-bit result.

Keep in mind that you can use block floating point. The ADSP-2101
documentation has a fine discussion of this format. That processor has
a vector normalization instruction -- your processor should as well.
Block floating point is a very good compromise between "real" floating
point and all-fixed-point math.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Reply With Quote
  #3 (permalink)  
Old 03-03-2005, 06:22 PM
Shawn Steenhagen
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Daniel,

I would guess the reason you are not getting good results for larger orders
is that the processor is "running out of cylces". In other words, the
floating point operations are taking too many MIPS when run on a native
fixed point machine. (This does not surprise me). What is your sample rate?

You need to get a grasp on the concepts of Q-types and Q-arithmetic rules
and get rid of all "floats" from your code.

The basic part of the update equation for LMS tap n, at time k is

Atap(n) = Atap(n) + mu*X(n)*e(k);

To do the above in fixed point, Q15 arithmetic (using pseudo code):
q15_mu_e = HIWORD( (q15_mu*q15_e) <<1);
for(n=0:N-1)
Atap[n] = Atap[n] + HIWORD( (q15_mu_e*q15_x[n]) <<1 );

If you want to stay in the C domain (and not learn 64x Assembly), you may
need to learn the compiler intrinsics for getting proper saturation/overflow
results, proper shifts, maintaining 32 bits of data, etc.

// For example, on the C55: (this is real C code)
int16 q15_mu_e;
int16 q15_mu;
int16 q15_e;
int16 q15_atap[NTAPS];
int16 q15_x[NTAPS];
..
..
// do the LMS update
q15_mu_e = _smpy(q15_mu,q15_e); // multiply, shift left by one, take upper
16 bits
for (n=0;n<NTAPS;n++)
q15_atap[n] += _smpy(q15_mu_e, q15_x[n]);

I haven't used the 64x family, but I imagine it has simliar intrinsics or a
section on how to properly use type casting in a C statement to get the
result you need.

The above implementation does not do any rounding and does not maintain any
intermediate value as 32bits, and so it is by no means going to give you the
best results. Also, it doesn't cover the circular buffer updates or the LMS
filter output calculations. So its not going to work with adding that
functionality too.

Check the TI web sites for App notes on the LMS algorithm for the C54x or
C55 family, it should provide some more insight into doing LMS on a fixed
point (it may also provide material relating to Q-types and the like.

Good Luck. You have a fair amount of work ahead of you.

-Shawn



"Daniel" <[email protected]> wrote in message
news:[email protected] om...
> Hello Everybody,
>
> for my diploma thesis, I have to implement a Least-Mean-Square
> Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a
> floating-point processor(TI 6713) earlier, so I just to the code and
> copied it. Of course, there are a lot of float variables in the code.
> When I ran the program, it workes for small FIR orders (6), but the
> larger the order of the filter, the worse the result.
> Is this because the 6416 cannot work with floating-point numbers
> accuretly?
> What can I do? When I convert all the float variables to integers I
> get overflow problems.
>
> Thanks a lot
> Daniel



Reply With Quote
  #4 (permalink)  
Old 03-03-2005, 06:51 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Tim Wescott <[email protected]> writes:
> [...]
> Generally if you stick to pure C you are stuck with integer math.
> DSP's are designed to do fixed-radix math pretty quickly, ...


Tim,

I think most of your points are helpful, but this one is off-the-mark
in my judgement. The typical fixed-point DSP operates much the same as
the C integer operations, performing integer math. Whether the
integers are reinterpreted to be fractional, fixed-point, or integer
is all in the interpretation and has little or nothing to do with the
implementation of the basic arithmetic operations (add, subtract,
multiply).

Of course there are differences between fixed-point DSP ALUs and the
"ALU" of a C compiler, the biggest of which are probably the wide
accumulators and the saturation options when performing various
operations. There is also the typical "left shift by 1" that a
fractional DSP does after a multiply to make the result fractional,
but that is certainly doable in C as well, albeit manually.
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
[email protected], 919-472-1124
Reply With Quote
  #5 (permalink)  
Old 03-03-2005, 08:18 PM
Jerry Avins
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:
> Tim Wescott <[email protected]> writes:
>
>>[...]
>>Generally if you stick to pure C you are stuck with integer math.
>>DSP's are designed to do fixed-radix math pretty quickly, ...

>
>
> Tim,
>
> I think most of your points are helpful, but this one is off-the-mark
> in my judgement. The typical fixed-point DSP operates much the same as
> the C integer operations, performing integer math. Whether the
> integers are reinterpreted to be fractional, fixed-point, or integer
> is all in the interpretation and has little or nothing to do with the
> implementation of the basic arithmetic operations (add, subtract,
> multiply).
>
> Of course there are differences between fixed-point DSP ALUs and the
> "ALU" of a C compiler, the biggest of which are probably the wide
> accumulators and the saturation options when performing various
> operations. There is also the typical "left shift by 1" that a
> fractional DSP does after a multiply to make the result fractional,
> but that is certainly doable in C as well, albeit manually.


Doesn't the usual fixed-point hardware do a shift after multiplying?
"Redundant sign bit" and all that.

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
Reply With Quote
  #6 (permalink)  
Old 03-03-2005, 09:23 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Jerry Avins <[email protected]> writes:

> Doesn't the usual fixed-point hardware do a shift after multiplying?
> "Redundant sign bit" and all that.


Depends on the processor. The TI TMS3205xx series does not by default
(there is a register for setting this behavior). The Motorola does.
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
[email protected], 919-472-1124
Reply With Quote
  #7 (permalink)  
Old 03-03-2005, 11:15 PM
john
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor


Tim Wescott wrote:
> Daniel wrote:
> > Hello Everybody,
> >
> > for my diploma thesis, I have to implement a Least-Mean-Square
> > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented

on a
> > floating-point processor(TI 6713) earlier, so I just to the code

and
> > copied it. Of course, there are a lot of float variables in the

code.
> > When I ran the program, it workes for small FIR orders (6), but the
> > larger the order of the filter, the worse the result.
> > Is this because the 6416 cannot work with floating-point numbers
> > accuretly?
> > What can I do? When I convert all the float variables to integers I
> > get overflow problems.
> >

> I've never had to think about implementing a LMS algorithm in
> fixed-point, so here's some general comments:
>
> The way to work with fixed-point arithmetic is to determine the

ranges
> of your data and your coefficients, then jigger things around to keep


> the results (final _and_ intermediate) in range. Everything else

flows
> from that.
>
> Generally if you stick to pure C you are stuck with integer math.

DSP's
> are designed to do fixed-radix math pretty quickly, and TI has a

library
> to deal with that (can't remember the name -- someone jump in here

and
> name it!). At it's most basic a DSP will do a vector dot product of

a
> bunch of integers with a fixed shift -- this is usually enough to get


> you anything you need, if you're clever enough.
>
> Look at how you can arrange things to use a fractional vector

multiply.
> This is what DSP's are built for (look for the MAC instruction), so

it
> should at least be possible. I've been consistently disappointed by

the
> libraries that come with DSP tools, but perhaps the 6416 is different

--
> and if not there's always assembly. IIRC the LMS algorithms that

I've
> seen reduce to finding a vector dot product or three and finding one
> scalar reciprocal. All these things are possible to do _fast_ with a


> fixed-point DSP.
>
> Don't be afraid to use 32-bit data paths -- even a 16-bit DSP should

be
> made so that you can do 4 16-bit vector dot products and combine them


> into one 32-bit result.
>
> Keep in mind that you can use block floating point. The ADSP-2101
> documentation has a fine discussion of this format. That processor

has
> a vector normalization instruction -- your processor should as well.
> Block floating point is a very good compromise between "real"

floating
> point and all-fixed-point math.
>
> --
>
> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com


TI uses the name 'dsplib' for their free code libraries. The '54X
library has a fixed point LMS function. The routines are 100% assembly,
and source is provided. If nothing else they are a good starting point.
I have used the FFT and FIR routines quite a bit.

The 21xx normalization instruction, if I recall correctly,
automatically keeps track of the worst case shift as you traverse a
vector. The '54X version does not do that.

John

Reply With Quote
  #8 (permalink)  
Old 03-04-2005, 03:00 AM
Jerry Avins
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:
> Jerry Avins <[email protected]> writes:
>
>
>>Doesn't the usual fixed-point hardware do a shift after multiplying?
>>"Redundant sign bit" and all that.

>
>
> Depends on the processor. The TI TMS3205xx series does not by default
> (there is a register for setting this behavior). The Motorola does.


Default or not, it isn't behavior one gets automatically from int
operations in C. That was my point.

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
Reply With Quote
  #9 (permalink)  
Old 03-04-2005, 05:07 AM
Tim Wescott
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:

> Tim Wescott <[email protected]> writes:
>
>>[...]
>>Generally if you stick to pure C you are stuck with integer math.
>>DSP's are designed to do fixed-radix math pretty quickly, ...

>
>
> Tim,
>
> I think most of your points are helpful, but this one is off-the-mark
> in my judgement. The typical fixed-point DSP operates much the same as
> the C integer operations, performing integer math. Whether the
> integers are reinterpreted to be fractional, fixed-point, or integer
> is all in the interpretation and has little or nothing to do with the
> implementation of the basic arithmetic operations (add, subtract,
> multiply).
>
> Of course there are differences between fixed-point DSP ALUs and the
> "ALU" of a C compiler, the biggest of which are probably the wide
> accumulators and the saturation options when performing various
> operations. There is also the typical "left shift by 1" that a
> fractional DSP does after a multiply to make the result fractional,
> but that is certainly doable in C as well, albeit manually.


The difference in clock ticks between implementing a fixed-point
arbitrary-radix vector dot-product in assembly on a DSP and trying to do
the same thing to the same precision in C on the same processor is on
the order of 100:1.

Even on a MAC-less processor when you are in assembly and multiply two
signed numbers N-bit numbers you can choose to take the lower N bits of
the 2N-1-bit result as C does, or you can take the upper N-1 bits and do
a shift, with way fewer clock cycles (10 or 20:1) than you could
implement the same functionality in C. I should know -- I've done it in
C a couple of times and in assembly on three or four different processors.

So no, I don't think it's off the mark at all.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Reply With Quote
  #10 (permalink)  
Old 03-04-2005, 08:48 AM
Ravi Srikantiah
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Daniel wrote:
> Hello Everybody,
>
> for my diploma thesis, I have to implement a Least-Mean-Square
> Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a
> floating-point processor(TI 6713) earlier, so I just to the code and
> copied it. Of course, there are a lot of float variables in the code.
> When I ran the program, it workes for small FIR orders (6), but the
> larger the order of the filter, the worse the result.
> Is this because the 6416 cannot work with floating-point numbers
> accuretly?
> What can I do? When I convert all the float variables to integers I
> get overflow problems.
>
> Thanks a lot
> Daniel


Daniel,

You can't just convert floating point variables into integers and expect
it to work fine. What you ought to do is to represent the floating point
numbers in signed-mantissa-exponent format (the signed mantissa and the
exponent you can represent as integers).

Then, you need to write little routines that do basic arithmetic
operations like multiply(), divide(), add(), subtract(), etc. that you
can use to replace the * / + and - in your code.

The TI compiler automatically does this for you and links in a floating
point library when you declare "float" variables. However, based on the
requirements of your code (precision, range, etc) you can choose to
write your own optimized functions that could be faster and optimized
for your application.

Regards,
Ravi
Reply With Quote
  #11 (permalink)  
Old 03-04-2005, 09:32 AM
steve
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor


Daniel wrote:
> Hello Everybody,
>
> for my diploma thesis, I have to implement a Least-Mean-Square
> Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on

a
> floating-point processor(TI 6713) earlier, so I just to the code and
> copied it. Of course, there are a lot of float variables in the code.
> When I ran the program, it workes for small FIR orders (6), but the
> larger the order of the filter, the worse the result.
> Is this because the 6416 cannot work with floating-point numbers
> accuretly?
> What can I do? When I convert all the float variables to integers I
> get overflow problems.
>
> Thanks a lot
> Daniel


I assume you took the LMS C code that was running on the 6713 and
recompiled it to run on the 6416 with an ANSI C complier. The results
should be identical, the 6713 uses its hardware to perform the floating
point calculations, the 6414 uses a software emulated floating point
package to perform the floating point caculations. That should be
transparent to the user. Could it be you were using double precision
math on the 6713 and single on the 6416?

Converting the algorithm to integer math is not easy, you can't just
change the type as others have pointed out.

Reply With Quote
  #12 (permalink)  
Old 03-04-2005, 01:46 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Jerry Avins <[email protected]> writes:

> Randy Yates wrote:
> > Jerry Avins <[email protected]> writes:
> >

>
> >> Doesn't the usual fixed-point hardware do a shift after
> >> multiplying? "Redundant sign bit" and all that.

>
> > Depends on the processor. The TI TMS3205xx series does not by default

>
> > (there is a register for setting this behavior). The Motorola does.

>
> Default or not, it isn't behavior one gets automatically from int
> operations in C. That was my point.


If by "it" you mean "automatic left shift by one bit after multiply,"
I agree. That is, it is true that the integer multiply operations in C
do not automatically left shift the result by one.

However, that wasn't what you asked, so I'm not sure how my response
was off-point to your question.
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
[email protected], 919-472-1124
Reply With Quote
  #13 (permalink)  
Old 03-04-2005, 03:04 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Tim Wescott <[email protected]> writes:

> Randy Yates wrote:
>
> > Tim Wescott <[email protected]> writes:
> >

>
> >>[...]
> >>Generally if you stick to pure C you are stuck with integer math.
> >>DSP's are designed to do fixed-radix math pretty quickly, ...

> > Tim, I think most of your points are helpful, but this one is
> > off-the-mark

>
> > in my judgement. The typical fixed-point DSP operates much the same as
> > the C integer operations, performing integer math. Whether the
> > integers are reinterpreted to be fractional, fixed-point, or integer
> > is all in the interpretation and has little or nothing to do with the
> > implementation of the basic arithmetic operations (add, subtract,
> > multiply).
> > Of course there are differences between fixed-point DSP ALUs and the

>
> > "ALU" of a C compiler, the biggest of which are probably the wide
> > accumulators and the saturation options when performing various
> > operations. There is also the typical "left shift by 1" that a
> > fractional DSP does after a multiply to make the result fractional,
> > but that is certainly doable in C as well, albeit manually.

>
> The difference in clock ticks between implementing a fixed-point
> arbitrary-radix vector dot-product in assembly on a DSP and trying to
> do the same thing to the same precision in C on the same processor is
> on the order of 100:1.


Who said anything about a vector operation? Your statement was

Generally if you stick to pure C you are stuck with integer math.
DSP's are designed to do fixed-radix math pretty quickly, ...

The term "math" does not mean "vector math" in my interpretation.

> Even on a MAC-less processor when you are in assembly and multiply two
> signed numbers N-bit numbers you can choose to take the lower N bits
> of the 2N-1-bit result as C does, or you can take the upper N-1 bits
> and do a shift, with way fewer clock cycles (10 or 20:1) than you
> could implement the same functionality in C. I should know -- I've
> done it in C a couple of times and in assembly on three or four
> different processors.


Apparently they did not include the TI TMS320C54x, arguably one of
the most popular DSPs around, and on that processor, the following
code

#include "dsptypes.h"

/* definitions */

#define VECTOR_LENGTH 64

/* local variables */

/* local function prototypes */

/* function definitions */

int main(int margc, char **margv)
{
UINT16_T n;
INT16_T x[VECTOR_LENGTH];
INT16_T y[VECTOR_LENGTH];
INT32_T acc;
INT16_T result;

acc = 0;
for (n = 0; n < VECTOR_LENGTH; n++)
{
x[n] = n;
y[n] = VECTOR_LENGTH - n - 1;
}

acc = 0;
for (n = 0; n < VECTOR_LENGTH; n++)
{
acc += x[n] * y[n];
}

result = (INT16_T)(acc >> 16);

return result;
}


produces the following assembly language

0000:0108 main
0000:0108 4A11 PSHM 11h
0000:0109 4A17 PSHM 17h
0000:010A EE80 FRAME -128
0000:010B E781 MVMM SP,AR1
0000:010C 6DE9 MAR *+AR1(64)
0000:010E E787 MVMM SP,AR7
0000:010F E782 MVMM SP,AR2
0000:0110 E800 LD #0h,A
0000:0111 771A STM 3fh,1ah
0000:0113 F072 RPTB 11ah
0000:0115 L1
0000:0115 8092 STL A,*AR2+
0000:0116 E93F LD #3fh,B
0000:0117 F520 SUB A,0,B
0000:0118 8191 STL B,*AR1+
0000:0119 F000 ADD #1h,0,A,A
0000:011B L2
0000:011B E782 MVMM SP,AR2
0000:011C 6DEA MAR *+AR2(64)
0000:011E E783 MVMM SP,AR3
0000:011F E800 LD #0h,A
0000:0120 EC3F RPT #3fh
0000:0121 L3
0000:0121 B089 MAC *AR2+,*AR3+,A,A
0000:0122 L4
0000:0122 F0E0 SFTL A,0,A
0000:0123 F0F0 SFTL A,-16,A
0000:0124 6BF8 ADDM 80h,*(18h)
0000:0127 F495 NOP
0000:0128 F495 NOP
0000:0129 8A17 POPM 17h
0000:012A 8A11 POPM 11h
0000:012B F4E4 FRET

Both the vector multiply and the end shift look pretty damn efficient
to me, Tim.

Thus even if we agree to interpret your point differently, it's still
inaccurate for one of the most popular DSPs in the world.
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
[email protected], 919-472-1124
Reply With Quote
  #14 (permalink)  
Old 03-04-2005, 04:02 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

PS

Tim Wescott <[email protected]> writes:

> Even on a MAC-less processor when you are in assembly and multiply two
> signed numbers N-bit numbers you can choose to take the lower N bits
> of the 2N-1-bit result as C does, or you can take the upper N-1 bits
> and do a shift, with way fewer clock cycles (10 or 20:1) than you
> could implement the same functionality in C.


The product of two N-bit numbers using two's complement arithmetic
requires 2*N bits to represent all possible results, not 2*N - 1 bits.

We can agree that it is alright to use 2*N - 1 bits and saturate or
truncate the one input combination that requires 2*N bits, but it is
improper to merely assume this is to be done (often it is not!).
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
[email protected], 919-472-1124
Reply With Quote
  #15 (permalink)  
Old 03-04-2005, 05:28 PM
Tim Wescott
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:

> PS
>
> Tim Wescott <[email protected]> writes:
>
>
>>Even on a MAC-less processor when you are in assembly and multiply two
>>signed numbers N-bit numbers you can choose to take the lower N bits
>>of the 2N-1-bit result as C does, or you can take the upper N-1 bits
>>and do a shift, with way fewer clock cycles (10 or 20:1) than you
>>could implement the same functionality in C.

>
>
> The product of two N-bit numbers using two's complement arithmetic
> requires 2*N bits to represent all possible results, not 2*N - 1 bits.
>
> We can agree that it is alright to use 2*N - 1 bits and saturate or
> truncate the one input combination that requires 2*N bits, but it is
> improper to merely assume this is to be done (often it is not!).


Oh. heh heh. Actually I was talking about the operation y = -(x1*x2) .

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Reply With Quote
  #16 (permalink)  
Old 03-04-2005, 05:36 PM
Tim Wescott
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:

> Tim Wescott <[email protected]> writes:
>
>
>>Randy Yates wrote:
>>
>>
>>>Tim Wescott <[email protected]> writes:
>>>

>>
>>>>[...]
>>>>Generally if you stick to pure C you are stuck with integer math.
>>>>DSP's are designed to do fixed-radix math pretty quickly, ...
>>>
>>>Tim, I think most of your points are helpful, but this one is
>>>off-the-mark

>>
>>>in my judgement. The typical fixed-point DSP operates much the same as
>>>the C integer operations, performing integer math. Whether the
>>>integers are reinterpreted to be fractional, fixed-point, or integer
>>>is all in the interpretation and has little or nothing to do with the
>>>implementation of the basic arithmetic operations (add, subtract,
>>>multiply).
>>>Of course there are differences between fixed-point DSP ALUs and the

>>
>>>"ALU" of a C compiler, the biggest of which are probably the wide
>>>accumulators and the saturation options when performing various
>>>operations. There is also the typical "left shift by 1" that a
>>>fractional DSP does after a multiply to make the result fractional,
>>>but that is certainly doable in C as well, albeit manually.

>>
>>The difference in clock ticks between implementing a fixed-point
>>arbitrary-radix vector dot-product in assembly on a DSP and trying to
>>do the same thing to the same precision in C on the same processor is
>>on the order of 100:1.

>
>
> Who said anything about a vector operation? Your statement was
>
> Generally if you stick to pure C you are stuck with integer math.
> DSP's are designed to do fixed-radix math pretty quickly, ...
>
> The term "math" does not mean "vector math" in my interpretation.
>
>
>>Even on a MAC-less processor when you are in assembly and multiply two
>>signed numbers N-bit numbers you can choose to take the lower N bits
>>of the 2N-1-bit result as C does, or you can take the upper N-1 bits
>>and do a shift, with way fewer clock cycles (10 or 20:1) than you
>>could implement the same functionality in C. I should know -- I've
>>done it in C a couple of times and in assembly on three or four
>>different processors.

>
>
> Apparently they did not include the TI TMS320C54x, arguably one of
> the most popular DSPs around, and on that processor, the following
> code
>
> #include "dsptypes.h"
>
> /* definitions */
>
> #define VECTOR_LENGTH 64
>
> /* local variables */
>
> /* local function prototypes */
>
> /* function definitions */
>
> int main(int margc, char **margv)
> {
> UINT16_T n;
> INT16_T x[VECTOR_LENGTH];
> INT16_T y[VECTOR_LENGTH];
> INT32_T acc;
> INT16_T result;
>
> acc = 0;
> for (n = 0; n < VECTOR_LENGTH; n++)
> {
> x[n] = n;
> y[n] = VECTOR_LENGTH - n - 1;
> }
>
> acc = 0;
> for (n = 0; n < VECTOR_LENGTH; n++)
> {
> acc += x[n] * y[n];
> }
>
> result = (INT16_T)(acc >> 16);
>
> return result;
> }
>
>
> produces the following assembly language
>
> 0000:0108 main
> 0000:0108 4A11 PSHM 11h
> 0000:0109 4A17 PSHM 17h
> 0000:010A EE80 FRAME -128
> 0000:010B E781 MVMM SP,AR1
> 0000:010C 6DE9 MAR *+AR1(64)
> 0000:010E E787 MVMM SP,AR7
> 0000:010F E782 MVMM SP,AR2
> 0000:0110 E800 LD #0h,A
> 0000:0111 771A STM 3fh,1ah
> 0000:0113 F072 RPTB 11ah
> 0000:0115 L1
> 0000:0115 8092 STL A,*AR2+
> 0000:0116 E93F LD #3fh,B
> 0000:0117 F520 SUB A,0,B
> 0000:0118 8191 STL B,*AR1+
> 0000:0119 F000 ADD #1h,0,A,A
> 0000:011B L2
> 0000:011B E782 MVMM SP,AR2
> 0000:011C 6DEA MAR *+AR2(64)
> 0000:011E E783 MVMM SP,AR3
> 0000:011F E800 LD #0h,A
> 0000:0120 EC3F RPT #3fh
> 0000:0121 L3
> 0000:0121 B089 MAC *AR2+,*AR3+,A,A
> 0000:0122 L4
> 0000:0122 F0E0 SFTL A,0,A
> 0000:0123 F0F0 SFTL A,-16,A
> 0000:0124 6BF8 ADDM 80h,*(18h)
> 0000:0127 F495 NOP
> 0000:0128 F495 NOP
> 0000:0129 8A17 POPM 17h
> 0000:012A 8A11 POPM 11h
> 0000:012B F4E4 FRET
>
> Both the vector multiply and the end shift look pretty damn efficient
> to me, Tim.
>
> Thus even if we agree to interpret your point differently, it's still
> inaccurate for one of the most popular DSPs in the world.


(A) That is the _only_ case that I know of for sure that the compiler
can figure out it needs to use a MAC and shift -- the version of Code
Composter that comes with the '2812 certainly doesn't do this, or I
couldn't find the magic finger-ring combination.

(B) As usual TI is playing fast and loose with the ANSI standard, and
that isn't even close to ANSI-compatible C. If it were the x[n] * y[n]
operation would be truncated to 16 bits before being added to acc, and
the result would be meaningless. Compile that up on machine that
supports 16-bit and 32-bit integers, print out the results, and see what
I mean.

Furthermore, you are actually making my point: by starting with an
awareness of the one thing that sets a DSP apart from the rest and
warping your code to fit that one thing you can make the operation very
fast. But in production code you will have to be constantly on guard to
make sure that the C code isn't "improved" in such a way that makes the
compiler implement it as a bunch of "traditional" integer operations,
thereby making it take 10-100 times slower, and likely reintroducing the
truncation (more like 10, TI is good at making fast processors).

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Reply With Quote
  #17 (permalink)  
Old 03-04-2005, 06:07 PM
Jerry Avins
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:
> Jerry Avins <[email protected]> writes:
>
>
>>Randy Yates wrote:
>>
>>>Jerry Avins <[email protected]> writes:
>>>

>>
>>>>Doesn't the usual fixed-point hardware do a shift after
>>>>multiplying? "Redundant sign bit" and all that.

>>
>>>Depends on the processor. The TI TMS3205xx series does not by default

>>
>>>(there is a register for setting this behavior). The Motorola does.

>>
>>Default or not, it isn't behavior one gets automatically from int
>>operations in C. That was my point.

>
>
> If by "it" you mean "automatic left shift by one bit after multiply,"
> I agree. That is, it is true that the integer multiply operations in C
> do not automatically left shift the result by one.
>
> However, that wasn't what you asked, so I'm not sure how my response
> was off-point to your question.


Then, as all too often, I was too oblique to get my point across.
Earlier, you wrote, "Whether the integers are reinterpreted to be
fractional, fixed- point, or integer is all in the interpretation and
has little or nothing to do with the implementation of the basic
arithmetic operations (add, subtract, multiply)."

There is -- or can be -- a slight but significant difference with
multiplication. Coding the shift in C can use a few extra cycles.

Since the choice between fixed-point and integer multiplication is made
with a bit in a register, I doubt that any compiler that knows only
integer would do fixed point efficiently. There are other differences
that just occurred to me. In the fixed-point shift, what had been the
MSB of the low-order word becomes the LSB of the returned result; that's
not possible with a HLL integer multiply. Worse, an HLL integer multiply
returns the low word of the product. A fixed-point multiply returns
(mostly) the high word. I can remedy this with int in C by promoting to
long, then multiplying, shifting, and truncating to int. That's painful
(although it could be a macro) and time consuming. And it won't work
with longs.

That's too stupid a scenario. What am I missing?

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
Reply With Quote
  #18 (permalink)  
Old 03-04-2005, 06:22 PM
robert bristow-johnson
Guest
 
Posts: n/a
Default N-bit x N-bit signed fixed-point multiply (was Porting LMS ...)

in article [email protected], Randy Yates at
[email protected] wrote on 03/04/2005 10:02:

> Tim Wescott <[email protected]> writes:
>
>> Even on a MAC-less processor when you are in assembly and multiply two
>> signed numbers N-bit numbers you can choose to take the lower N bits
>> of the 2N-1 bit result as C does, or you can take the upper N-1 bits
>> and do a shift, with way fewer clock cycles (10 or 20:1) than you
>> could implement the same functionality in C.

>
> The product of two N-bit numbers using two's complement arithmetic
> requires 2*N bits to represent all possible results, not 2*N - 1 bits.
>
> We can agree that it is alright to use 2*N - 1 bits and saturate or
> truncate the one input combination that requires 2*N bits, but it is
> improper to merely assume this is to be done (often it is not!).


it *is* only one case, the -1 x -1 = +1 (or in integer math,
-2^(N-1) x -2^(N-1) = +2^(2N-2) which requires 2N bits. it is the only
case where the two MSBs are not identical.

nonetheless, because an N-bit "fractional" fixed-point number, F, is related
to its N-bit 2s-complement integer representation, I, (same bit pattern for
both) by

I = 2^(N-1) * [ F ] or F = 2^(1-N) * [ I ]

when two fixed-point numbers are multiplied together

F1*F2 = [2^(1-N)*I1]*[2^(1-N)*I2]

= 2^(1-N) * [ (2^(1-N) * (I1*I2) ]

so after the hardware does the integer multiplication, you have to shift
this N-1 bits to the right to get the properly scaled result no matter what.
even in this special case where F1 = F2 = -1.0 . in an integer machine,
you do your N bit x N bit signed integer multiply, shift left 1 bit, and
take your result from the most significant word of the 2N bit product
register (which is the same as shifting right N bits). i know this is in
your treatise, Randy, but i wanna make or emphasize another point.



what struck me odd about Motorola DSP56K and DSP563xx series, is the LSB in
the 56 bit accumulator. they scaled everything right, but failed to
recognize that the LSB was always zero (because it's a 47 bit result after
MPY, not a 48 bit result). this was a big mistake for a lot of reasons.
one that comes to mind is the code you need in the 56K to do table lookup
and linear interpolation, which i used to do all the damn time.


move #>functable,r1
move #>(functable_size/2),n1 ; half size of table
move n1,y0 ; this copies the bits
move x:function_input,x0 ; -1 <= function_input < 1
mpy y0,x0,a (r1)+n1 ; point to middle of table
move a1,n1
move a0,b1 ; fractional bits go into b1
lsr b (r1)+n1 ; point to first value
move xr1)+,x0 ; get first point
tfr x0,a b1,y0
mac -y0,x0,a xr1),x0 ; get next point
macr y0,x0,a ; finish interpolation
; function result in a

this does one table lookup and linear interpolation.

if, in this instruction

move a0,b1

they moved the 23 MSBs of a0 into the
23 LSBs of the destination register (with a zero extension), then i could
move it directly to y0 and eliminate some instructions. since, after an MPY
instruction, the LSB of a0 or b0 is always zero (because of that left shift
by one bit inherent to fixed-point arithmetic), they could get rid of that
bit and then the 23 bits that are left are precisely the fractional bits i
want, and if zero extended when moved to a 24 bit number signed fractional
register, are precisely in the fractional form that i want.


so, in general, an N bit x N bit signed fixed-point number is a 2N-1 bit
result where you get the similarly scaled result out of the N most
significant bits. sign extending that into guard bits will take care of the
one case of -1 x -1 (as well as take care of other problems), but there are
only N-1 "fractional bits" below those N bits to worry about and never N
bits.



--

r b-j [email protected]

"Imagination is more important than knowledge."


Reply With Quote
  #19 (permalink)  
Old 03-04-2005, 07:08 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Jerry Avins <[email protected]> writes:

> Randy Yates wrote:
> > Jerry Avins <[email protected]> writes:
> >

>
> >>Randy Yates wrote:
> >>
> >>>Jerry Avins <[email protected]> writes:
> >>>
> >>
> >>>>Doesn't the usual fixed-point hardware do a shift after
> >>>>multiplying? "Redundant sign bit" and all that.
> >>
> >>>Depends on the processor. The TI TMS3205xx series does not by default
> >>
> >>>(there is a register for setting this behavior). The Motorola does.
> >>
> >>Default or not, it isn't behavior one gets automatically from int
> >>operations in C. That was my point.

> > If by "it" you mean "automatic left shift by one bit after multiply,"

>
> > I agree. That is, it is true that the integer multiply operations in C
> > do not automatically left shift the result by one.
> > However, that wasn't what you asked, so I'm not sure how my response

>
> > was off-point to your question.

>
> Then, as all too often, I was too oblique to get my point
> across. Earlier, you wrote, "Whether the integers are reinterpreted to
> be fractional, fixed- point, or integer is all in the interpretation
> and has little or nothing to do with the implementation of the basic
> arithmetic operations (add, subtract, multiply)."
>
>
> There is -- or can be -- a slight but significant difference with
> multiplication. Coding the shift in C can use a few extra cycles.


Yes, in the case where you are doing the left shift, there may be
extra cycles required to do this in C. However, this isn't always the
case, and when it is the case, the number of cycles in the difference
will probably be between 0 and "not very many," depending on the
compiler, the specific types of operations being performed, and the
ingenuity of the programmer.

Thus the distinction does not warrant a generic caveat regarding
fixed-point arithmetic in C, in my opinion.

> Since the choice between fixed-point and integer multiplication is
> made with a bit in a register,


It is this sort of thinking that is propagating the error in this
thread. This logic is tantamount to saying that "the difference
between a Ford and a Chevy is that a Ford has a better ride." It
attempts to generalize what may be true in a specific case.

> I doubt that any compiler that knows
> only integer would do fixed point efficiently.


Did you see my other post where I gave actual code that refutes this?

> There are other
> differences that just occurred to me. In the fixed-point shift, what
> had been the MSB of the low-order word becomes the LSB of the returned
> result;


I'm not sure of the scenario you're trying to describe here.

> that's not possible with a HLL integer multiply. Worse, an HLL
> integer multiply returns the low word of the product. A fixed-point
> multiply returns (mostly) the high word.


Not necessarily. Depends on what you're doing.

> I can remedy this with int in
> C by promoting to long, then multiplying, shifting, and truncating to
> int. That's painful (although it could be a macro) and time
> consuming.


What's so time-consuming about it? It might cost you one cycle. Big
whoop.

> And it won't work with longs.


True unless the compiler supports an extended long, i.e., an integer
the size of the accumulator. Although double-precision multiplication
takes us pretty far afield from the topic.

> That's too stupid a scenario. What am I missing?


I think you're right - you are just beginning to think about more of
the details.
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
[email protected], 919-472-1124
Reply With Quote
  #20 (permalink)  
Old 03-04-2005, 09:19 PM
Jerry Avins
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:
> Jerry Avins <[email protected]> writes:


...

>>There is -- or can be -- a slight but significant difference with
>>multiplication. Coding the shift in C can use a few extra cycles.

>
>
> Yes, in the case where you are doing the left shift, there may be
> extra cycles required to do this in C. However, this isn't always the
> case, and when it is the case, the number of cycles in the difference
> will probably be between 0 and "not very many," depending on the
> compiler, the specific types of operations being performed, and the
> ingenuity of the programmer.
>
> Thus the distinction does not warrant a generic caveat regarding
> fixed-point arithmetic in C, in my opinion.


What general caveat, "Watch out for gotchas"?

>>Since the choice between fixed-point and integer multiplication is
>>made with a bit in a register,

>
>
> It is this sort of thinking that is propagating the error in this
> thread. This logic is tantamount to saying that "the difference
> between a Ford and a Chevy is that a Ford has a better ride." It
> attempts to generalize what may be true in a specific case.


I don't understand. Can I expect a C compiler to change the
integer/~fixed bit as appropriate?

>>I doubt that any compiler that knows
>>only integer would do fixed point efficiently.

>
>
> Did you see my other post where I gave actual code that refutes this?


Yes. Impressive but, I suspect, rare.

>>There are other
>>differences that just occurred to me. In the fixed-point shift, what
>>had been the MSB of the low-order word becomes the LSB of the returned
>>result;

>
>
> I'm not sure of the scenario you're trying to describe here.


The product of a 32-by-32 multiply has 64 bits. Numbering them zero to
63 in increasing significance, an ordinary integer multiply returns zero
to 31, or zero to 30 plus the sign bit, 63. A fixed-point multiply
should return bits 31 to 62 of the product. Shifting the upper word
alone forces bit 31 to zero. (I think I could be clearer. Should I try?)

>>that's not possible with a HLL integer multiply. Worse, an HLL
>>integer multiply returns the low word of the product. A fixed-point
>>multiply returns (mostly) the high word.

>
>
> Not necessarily. Depends on what you're doing.


Please elaborate.

>>I can remedy this with int in
>>C by promoting to long, then multiplying, shifting, and truncating to
>>int. That's painful (although it could be a macro) and time
>>consuming.

>
>
> What's so time-consuming about it? It might cost you one cycle. Big
> whoop.


Better than I supposed by far. I may give up on assembler after all!

>>And it won't work with longs.

>
>
> True unless the compiler supports an extended long, i.e., an integer
> the size of the accumulator. Although double-precision multiplication
> takes us pretty far afield from the topic.
>
>
>>That's too stupid a scenario. What am I missing?

>
>
> I think you're right - you are just beginning to think about more of
> the details.


I hate psyching out a compiler when I know what I want in the end. "Just
do it" never seemed more appropriate.

Thanks for the education.

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
Reply With Quote
  #21 (permalink)  
Old 03-04-2005, 10:16 PM
Shawn Steenhagen
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy,

Careful about that extra sign bit. The 54x C compiler assumes it the FRCT
bit on the 54x is 0, i.e. no automatic left shift of one, so if we are
working with "Q15" variable types, then we need to get rid of the extra sign
bit in the final store operation (shift by 15 instead of 16).

result = (INT16_T)(acc >> 15);

I, too, am suprised the C compiler correctly does the MAC, according to the
54x C compilers guide the proper way to guarantee you'll get the correct
result would be to type cast:

for (n = 0; n < VECTOR_LENGTH; n++)
{
acc += ((long) x[n]) *((long) y[n]);
}

I prefer to use the 54x or 55x instrinsics as shown in my first response to
the original question (I have a code example there). Yes this code is NOT
portable (at least to other DSP families), but I'm sure it is doing what I
expect.

-Shawn


"Randy Yates" <[email protected]> wrote in message
news:[email protected]..
> Tim Wescott <[email protected]> writes:
>
> > Randy Yates wrote:
> >
> > > Tim Wescott <[email protected]> writes:
> > >

> >
> > >>[...]
> > >>Generally if you stick to pure C you are stuck with integer math.
> > >>DSP's are designed to do fixed-radix math pretty quickly, ...
> > > Tim, I think most of your points are helpful, but this one is
> > > off-the-mark

> >
> > > in my judgement. The typical fixed-point DSP operates much the same as
> > > the C integer operations, performing integer math. Whether the
> > > integers are reinterpreted to be fractional, fixed-point, or integer
> > > is all in the interpretation and has little or nothing to do with the
> > > implementation of the basic arithmetic operations (add, subtract,
> > > multiply).
> > > Of course there are differences between fixed-point DSP ALUs and the

> >
> > > "ALU" of a C compiler, the biggest of which are probably the wide
> > > accumulators and the saturation options when performing various
> > > operations. There is also the typical "left shift by 1" that a
> > > fractional DSP does after a multiply to make the result fractional,
> > > but that is certainly doable in C as well, albeit manually.

> >
> > The difference in clock ticks between implementing a fixed-point
> > arbitrary-radix vector dot-product in assembly on a DSP and trying to
> > do the same thing to the same precision in C on the same processor is
> > on the order of 100:1.

>
> Who said anything about a vector operation? Your statement was
>
> Generally if you stick to pure C you are stuck with integer math.
> DSP's are designed to do fixed-radix math pretty quickly, ...
>
> The term "math" does not mean "vector math" in my interpretation.
>
> > Even on a MAC-less processor when you are in assembly and multiply two
> > signed numbers N-bit numbers you can choose to take the lower N bits
> > of the 2N-1-bit result as C does, or you can take the upper N-1 bits
> > and do a shift, with way fewer clock cycles (10 or 20:1) than you
> > could implement the same functionality in C. I should know -- I've
> > done it in C a couple of times and in assembly on three or four
> > different processors.

>
> Apparently they did not include the TI TMS320C54x, arguably one of
> the most popular DSPs around, and on that processor, the following
> code
>
> #include "dsptypes.h"
>
> /* definitions */
>
> #define VECTOR_LENGTH 64
>
> /* local variables */
>
> /* local function prototypes */
>
> /* function definitions */
>
> int main(int margc, char **margv)
> {
> UINT16_T n;
> INT16_T x[VECTOR_LENGTH];
> INT16_T y[VECTOR_LENGTH];
> INT32_T acc;
> INT16_T result;
>
> acc = 0;
> for (n = 0; n < VECTOR_LENGTH; n++)
> {
> x[n] = n;
> y[n] = VECTOR_LENGTH - n - 1;
> }
>
> acc = 0;
> for (n = 0; n < VECTOR_LENGTH; n++)
> {
> acc += x[n] * y[n];
> }
>
> result = (INT16_T)(acc >> 16);
>
> return result;
> }
>
>
> produces the following assembly language
>
> 0000:0108 main
> 0000:0108 4A11 PSHM 11h
> 0000:0109 4A17 PSHM 17h
> 0000:010A EE80 FRAME -128
> 0000:010B E781 MVMM SP,AR1
> 0000:010C 6DE9 MAR *+AR1(64)
> 0000:010E E787 MVMM SP,AR7
> 0000:010F E782 MVMM SP,AR2
> 0000:0110 E800 LD #0h,A
> 0000:0111 771A STM 3fh,1ah
> 0000:0113 F072 RPTB 11ah
> 0000:0115 L1
> 0000:0115 8092 STL A,*AR2+
> 0000:0116 E93F LD #3fh,B
> 0000:0117 F520 SUB A,0,B
> 0000:0118 8191 STL B,*AR1+
> 0000:0119 F000 ADD #1h,0,A,A
> 0000:011B L2
> 0000:011B E782 MVMM SP,AR2
> 0000:011C 6DEA MAR *+AR2(64)
> 0000:011E E783 MVMM SP,AR3
> 0000:011F E800 LD #0h,A
> 0000:0120 EC3F RPT #3fh
> 0000:0121 L3
> 0000:0121 B089 MAC *AR2+,*AR3+,A,A
> 0000:0122 L4
> 0000:0122 F0E0 SFTL A,0,A
> 0000:0123 F0F0 SFTL A,-16,A
> 0000:0124 6BF8 ADDM 80h,*(18h)
> 0000:0127 F495 NOP
> 0000:0128 F495 NOP
> 0000:0129 8A17 POPM 17h
> 0000:012A 8A11 POPM 11h
> 0000:012B F4E4 FRET
>
> Both the vector multiply and the end shift look pretty damn efficient
> to me, Tim.
>
> Thus even if we agree to interpret your point differently, it's still
> inaccurate for one of the most popular DSPs in the world.
> --
> Randy Yates
> Sony Ericsson Mobile Communications
> Research Triangle Park, NC, USA
> [email protected], 919-472-1124



Reply With Quote
  #22 (permalink)  
Old 03-05-2005, 05:13 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

"Shawn Steenhagen" <[email protected] m> writes:

> Randy,
>
> Careful about that extra sign bit.


Shawn, your warning is ludicrous. My paper on scaling in FIR
fixed-point arithmetic - a reference for many who have visited this
group looking for information on the topic - discusses this point. I
wrote it years before I ever heard of your name on this group. I am
also the author of the comp.dsp FAQ item on fixed-point arithmetic
and as such am well-versed on the issue.

I noticed your tendency to assume I was ignorant at the comp.dsp
conference. Perhaps you should reexamine your assumptions to save
yourself further embarassment.

I'm not going to honor your point with a response except to say that
you have given me no new information and I still stand by everything
I have asserted in this thread.
--
% Randy Yates % "She's sweet on Wagner-I think she'd die for Beethoven.
%% Fuquay-Varina, NC % She love the way Puccini lays down a tune, and
%%% 919-577-9882 % Verdi's always creepin' from her room."
%%%% <[email protected]> % "Rockaria", *A New World Record*, ELO
http://home.earthlink.net/~yatescr
Reply With Quote
  #23 (permalink)  
Old 03-05-2005, 05:40 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Jerry Avins <[email protected]> writes:

> Randy Yates wrote:
>> Jerry Avins <[email protected]> writes:

>
> ...
>
>>>There is -- or can be -- a slight but significant difference with
>>>multiplication. Coding the shift in C can use a few extra cycles.

>> Yes, in the case where you are doing the left shift, there may be
>> extra cycles required to do this in C. However, this isn't always the
>> case, and when it is the case, the number of cycles in the difference
>> will probably be between 0 and "not very many," depending on the
>> compiler, the specific types of operations being performed, and the
>> ingenuity of the programmer.
>> Thus the distinction does not warrant a generic caveat regarding
>> fixed-point arithmetic in C, in my opinion.

>
> What general caveat, "Watch out for gotchas"?


No. The original caveat asserted by Tim Wescott, which you seem to be
defending, albeit in an askance manner.

>>>Since the choice between fixed-point and integer multiplication is
>>> made with a bit in a register,

>> It is this sort of thinking that is propagating the error in this
>> thread. This logic is tantamount to saying that "the difference
>> between a Ford and a Chevy is that a Ford has a better ride." It
>> attempts to generalize what may be true in a specific case.

>
> I don't understand. Can I expect a C compiler to change the
> integer/~fixed bit as appropriate?


No, you cannot. HOWEVER... "Fixed-point" does NOT *generally* mean you
are going to left shift after a multiply!!!

>>>I doubt that any compiler that knows
>>> only integer would do fixed point efficiently.

>> Did you see my other post where I gave actual code that refutes this?

>
> Yes. Impressive but, I suspect, rare.


You're free to embrace your suspicions, but in my opinion you're
bordering on being obstinately ignorant. This is my job, Jerry. I do
this EXACT type of activity many times during the course of my
development work at Sony Ericsson, and I'm telling you that
fixed-point arithmetic in C is VERY doable. There are some caveats and
limitations, but for the most part I've been able to use C to simulate
my fixed-point code.

My typical flow when developing an algorithm routine is as follows:
1) exploration and experimentation in Matlab until a candidate
algorithm is found, 2) translate to fixed-point by writing
a C simulation on the Sun, 3) either retarget the Sun C code
to the DSP (if speed isn't crucial) or reimplement in assembly
on the DSP.

Do you think that I specially-formulated the posted code so that it
would compile efficiently? I did not. In fact the code you have in
that post is the very first thing I tried. Tim Wescott had asserted
that vector multiplication was inefficient in C on a DSP. Now I'm not
sure what sort of strange variations on vector multiplies you want to
present as exceptions, but it seems to me that the operation is pretty
clearly defined.

>>>There are other
>>>differences that just occurred to me. In the fixed-point shift, what
>>>had been the MSB of the low-order word becomes the LSB of the returned
>>> result;

>> I'm not sure of the scenario you're trying to describe here.

>
> The product of a 32-by-32 multiply has 64 bits. Numbering them zero to
> 63 in increasing significance, an ordinary integer multiply returns
> zero to 31, or zero to 30 plus the sign bit, 63. A fixed-point
> multiply should return bits 31 to 62 of the product. Shifting the
> upper word alone forces bit 31 to zero. (I think I could be
> clearer. Should I try?)


I see. Who said that shifts occur in 16-bit chunks? In C, you would
presumably maintain a 32-bit variable to hold the result, and when you
shift it, you shift the entire 32 bits. In assembly, the accumulator
is holding the result, and when you shift it, you shift the entire
accumulator. Thus this appears to be a non-problem.

>>>that's not possible with a HLL integer multiply. Worse, an HLL
>>>integer multiply returns the low word of the product. A fixed-point
>>> multiply returns (mostly) the high word.

>> Not necessarily. Depends on what you're doing.

>
> Please elaborate.


Read my paper on fixed-point considerations in FIR filtering and
you'll understand. The 16-bits you extract from a 40-bit accumulator
(to stick with a TI-ish architecture) can be all over the map
depending your objective, the algorithm, the input scaling, and the
output scaling. This will also help to cement why "fractional" is a
small part of the "fixed-point" world.

>>>I can remedy this with int in
>>>C by promoting to long, then multiplying, shifting, and truncating to
>>>int. That's painful (although it could be a macro) and time
>>> consuming.

>> What's so time-consuming about it? It might cost you one cycle. Big
>> whoop.

>
> Better than I supposed by far. I may give up on assembler after all!


Unless you need to shave the last few ten percent off your execution time,
I'd say C is fine for fixed-point. In my line of work, it is often the
case that you do want to save that last few ten percent, especially since
such operations are usually in the most expensive "inner-loop" portion
of the algorithm.

>>>And it won't work with longs.

>> True unless the compiler supports an extended long, i.e., an integer
>> the size of the accumulator. Although double-precision multiplication
>> takes us pretty far afield from the topic.
>>>That's too stupid a scenario. What am I missing?

>> I think you're right - you are just beginning to think about more of
>> the details.

>
> I hate psyching out a compiler when I know what I want in the
> end. "Just do it" never seemed more appropriate.


Well, there is an advantage in doing a small amount of finagling in
2 percent of the code when coding the other 98 percent assembly would
be painful.
--
% Randy Yates % "Maybe one day I'll feel her cold embrace,
%% Fuquay-Varina, NC % and kiss her interface,
%%% 919-577-9882 % til then, I'll leave her alone."
%%%% <[email protected]> % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr
Reply With Quote
  #24 (permalink)  
Old 03-05-2005, 05:49 PM
Randy Yates
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates <[email protected]> writes:

>> The product of a 32-by-32 multiply has 64 bits. Numbering them zero to
>> 63 in increasing significance, an ordinary integer multiply returns
>> zero to 31, or zero to 30 plus the sign bit, 63. A fixed-point
>> multiply should return bits 31 to 62 of the product. Shifting the
>> upper word alone forces bit 31 to zero. (I think I could be
>> clearer. Should I try?)

>
> I see. Who said that shifts occur in 16-bit chunks? In C, you would
> presumably maintain a 32-bit variable to hold the result, and when you
> shift it, you shift the entire 32 bits. In assembly, the accumulator
> is holding the result, and when you shift it, you shift the entire
> accumulator. Thus this appears to be a non-problem.


Sorry - I had lost context when writing that. Translate "32" and "16" to
"64" and "32". The conclusion is still the same. This is a non-problem.
--
% Randy Yates % "Midnight, on the water...
%% Fuquay-Varina, NC % I saw... the ocean's daughter."
%%% 919-577-9882 % 'Can't Get It Out Of My Head'
%%%% <[email protected]> % *El Dorado*, Electric Light Orchestra
http://home.earthlink.net/~yatescr
Reply With Quote
  #25 (permalink)  
Old 03-05-2005, 07:11 PM
Jerry Avins
Guest
 
Posts: n/a
Default Re: Porting LMS from floating-point to fixed-point processor

Randy Yates wrote:
> Jerry Avins <[email protected]> writes:
>
>
>>Randy Yates wrote:
>>
>>>Jerry Avins <[email protected]> writes:

>>
>> ...
>>
>>
>>>>There is -- or can be -- a slight but significant difference with
>>>>multiplication. Coding the shift in C can use a few extra cycles.
>>>
>>>Yes, in the case where you are doing the left shift, there may be
>>>extra cycles required to do this in C. However, this isn't always the
>>>case, and when it is the case, the number of cycles in the difference
>>>will probably be between 0 and "not very many," depending on the
>>>compiler, the specific types of operations being performed, and the
>>>ingenuity of the programmer.
>>>Thus the distinction does not warrant a generic caveat regarding
>>>fixed-point arithmetic in C, in my opinion.

>>
>>What general caveat, "Watch out for gotchas"?

>
>
> No. The original caveat asserted by Tim Wescott, which you seem to be
> defending, albeit in an askance manner.
>
>
>>>>Since the choice between fixed-point and integer multiplication is
>>>>made with a bit in a register,
>>>
>>>It is this sort of thinking that is propagating the error in this
>>>thread. This logic is tantamount to saying that "the difference
>>>between a Ford and a Chevy is that a Ford has a better ride." It
>>>attempts to generalize what may be true in a specific case.

>>
>>I don't understand. Can I expect a C compiler to change the
>>integer/~fixed bit as appropriate?

>
>
> No, you cannot. HOWEVER... "Fixed-point" does NOT *generally* mean you
> are going to left shift after a multiply!!!


OK. Then what error's propagation were you referring to?

>>>>I doubt that any compiler that knows
>>>>only integer would do fixed point efficiently.
>>>
>>>Did you see my other post where I gave actual code that refutes this?

>>
>>Yes. Impressive but, I suspect, rare.

>
>
> You're free to embrace your suspicions, but in my opinion you're
> bordering on being obstinately ignorant. This is my job, Jerry. I do
> this EXACT type of activity many times during the course of my
> development work at Sony Ericsson, and I'm telling you that
> fixed-point arithmetic in C is VERY doable. There are some caveats and
> limitations, but for the most part I've been able to use C to simulate
> my fixed-point code.


By rare, I mean a few compilers tailored specifically to DSP by DSP a
manufacturer. That's what I suspect. If you tell me otherwise, I'll
happily accept it. I believe very strongly that you know what you're
talking about. I think that very often, we don't know what the other is
talking about.

> My typical flow when developing an algorithm routine is as follows:
> 1) exploration and experimentation in Matlab until a candidate
> algorithm is found, 2) translate to fixed-point by writing
> a C simulation on the Sun, 3) either retarget the Sun C code
> to the DSP (if speed isn't crucial) or reimplement in assembly
> on the DSP.
>
> Do you think that I specially-formulated the posted code so that it
> would compile efficiently? I did not. In fact the code you have in
> that post is the very first thing I tried. Tim Wescott had asserted
> that vector multiplication was inefficient in C on a DSP. Now I'm not
> sure what sort of strange variations on vector multiplies you want to
> present as exceptions, but it seems to me that the operation is pretty
> clearly defined.


You just told me otherwise. Good! (I never mentioned vector multiplies.
To my discredit, I don't even think that way.)

>>>>There are other
>>>>differences that just occurred to me. In the fixed-point shift, what
>>>>had been the MSB of the low-order word becomes the LSB of the returned
>>>>result;
>>>
>>>I'm not sure of the scenario you're trying to describe here.

>>
>>The product of a 32-by-32 multiply has 64 bits. Numbering them zero to
>>63 in increasing significance, an ordinary integer multiply returns
>>zero to 31, or zero to 30 plus the sign bit, 63. A fixed-point
>>multiply should return bits 31 to 62 of the product. Shifting the
>>upper word alone forces bit 31 to zero. (I think I could be
>>clearer. Should I try?)

>
>
> I see. Who said that shifts occur in 16-bit chunks? In C, you would
> presumably maintain a 32-bit variable to hold the result, and when you
> shift it, you shift the entire 32 bits. In assembly, the accumulator
> is holding the result, and when you shift it, you shift the entire
> accumulator. Thus this appears to be a non-problem.


With a 32-by-32 fixed-point fractional multiply, I first get a 64-bit
result, then shift left one bit and keep the resulting high 32 bits.
Where do the 16-bit chunks fit in?

>>>>that's not possible with a HLL integer multiply. Worse, an HLL
>>>>integer multiply returns the low word of the product. A fixed-point
>>>>multiply returns (mostly) the high word.
>>>
>>>Not necessarily. Depends on what you're doing.

>>
>>Please elaborate.

>
>
> Read my paper on fixed-point considerations in FIR filtering and
> you'll understand. The 16-bits you extract from a 40-bit accumulator
> (to stick with a TI-ish architecture) can be all over the map
> depending your objective, the algorithm, the input scaling, and the
> output scaling. This will also help to cement why "fractional" is a
> small part of the "fixed-point" world.


The only hardware fixed-point support I'm aware is pure fractional,
hence the orientation of mt statements. Also I had in mind the 'C31's
32-bit word size. I've been using scaled integers with all sorts of
scaling factors for many years, both in Forth and in assembly, mostly on
8-bit processors with multiple-precision arithmetic. (Before IEEE
standardization of floats, it was the best way I knew to write
bit-reproducible code.)

>>>>I can remedy this with int in
>>>>C by promoting to long, then multiplying, shifting, and truncating to
>>>>int. That's painful (although it could be a macro) and time
>>>>consuming.
>>>
>>>What's so time-consuming about it? It might cost you one cycle. Big
>>>whoop.

>>
>>Better than I supposed by far. I may give up on assembler after all!

>
>
> Unless you need to shave the last few ten percent off your execution time,
> I'd say C is fine for fixed-point. In my line of work, it is often the
> case that you do want to save that last few ten percent, especially since
> such operations are usually in the most expensive "inner-loop" portion
> of the algorithm.


You've convinced me that it is, but you haven't convinced me that by
just using integer, everything will work out. I believe something you
wrote in the post that elicited my first response implied that. Here:

"The typical fixed-point DSP operates much the same as
the C integer operations, performing integer math. Whether the
integers are reinterpreted to be fractional, fixed-point, or integer
is all in the interpretation and has little or nothing to do with
the implementation of the basic arithmetic operations ..."

Surely, there's always some adjustment needed after multiplying, however
little time it may take.

...

Jerry
--
Engineering is the art of making what you want from things you can get.
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
converting floating point to fixed point H aka N VHDL 15 03-02-2006 03:26 PM
fixed point & floating point ranjeet DSP 17 09-09-2004 08:40 AM
Floating point to fixed point conversion. Gowtham DSP 7 07-26-2003 02:33 PM
floating point v fixed point Nick ELLIOTT DSP 4 07-03-2003 08:06 PM
Fixed Point DSP Vs Floating Point DSP Sandeep Chikkerur DSP 2 06-27-2003 01:35 AM


All times are GMT +1. The time now is 03:41 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved