I have a question regarding Fixed point Arithmetic addition.

For example, i have two fixed point numbers:

a = unsigned Q7.8 format (7-bit integer, 8 bit factional).
b = unsigned Q7.8 format ( " " ).

Now a + b = c, where c is an unsigned Q8.8 result.

Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??

The way i have tried to approach it is as follows:

Integer part
----------------
The way i have thought about the integer part is to say that if bit
[15] of the result c is a '1', then bits[14:8] of d is b"111_1111",
otherwise d[14:8] = c[14:8].

Is is correct ??

Fractional Part
----------------------

The way i have thought about the fractional part is that for d, i want
one extra fractional bit to increase the fractional preciion.

The obvious way to me seems to be to add an extra bit at the LSB end:
ie d[8:0 = c[7:0] & 1'b0.

Is this correct?

QS; Can anyone recommend a good book on Fixed Point and Floating point
arithmetic ?

thunder wrote:
> Hi
>
> I have a question regarding Fixed point Arithmetic addition.
>
> For example, i have two fixed point numbers:
>
> a = unsigned Q7.8 format (7-bit integer, 8 bit factional).
> b = unsigned Q7.8 format ( " " ).
>
> Now a + b = c, where c is an unsigned Q8.8 result.

Then there is overflow, just as two Q15.0 integers and getting a Q16.0
sum. (Remember the sign bit.)

> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??

You can't. Count the bits. (Remember the sign bit.)

...

> QS; Can anyone recommend a good book on Fixed Point and Floating point
> arithmetic ?

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

On Mon, 26 Oct 2009 10:53:23 -0400, Jerry Avins wrote:

> thunder wrote:
>> Hi
>>
>> I have a question regarding Fixed point Arithmetic addition.
>>
>> For example, i have two fixed point numbers:
>>
>> a = unsigned Q7.8 format (7-bit integer, 8 bit factional). b = unsigned
>> Q7.8 format ( " " ).
>>
>> Now a + b = c, where c is an unsigned Q8.8 result.
>
> Then there is overflow, just as two Q15.0 integers and getting a Q16.0
> sum. (Remember the sign bit.)
>
>> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
>
> You can't. Count the bits. (Remember the sign bit.)
>
You can't on a 16-bit machine, but if you're working in an FPGA or custom
logic a 17-bit type is no problem.

On Mon, 26 Oct 2009 01:17:59 -0700, thunder wrote:

> Hi
>
> I have a question regarding Fixed point Arithmetic addition.
>
> For example, i have two fixed point numbers:
>
> a = unsigned Q7.8 format (7-bit integer, 8 bit factional). b = unsigned
> Q7.8 format ( " " ).
>
> Now a + b = c, where c is an unsigned Q8.8 result.
>
> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
>
> The way i have tried to approach it is as follows:
>
> Integer part
> ----------------
> The way i have thought about the integer part is to say that if bit [15]
> of the result c is a '1', then bits[14:8] of d is b"111_1111",
> otherwise d[14:8] = c[14:8].
>
> Is is correct ??
>
> Fractional Part
> ----------------------
>
> The way i have thought about the fractional part is that for d, i want
> one extra fractional bit to increase the fractional preciion.
>
> The obvious way to me seems to be to add an extra bit at the LSB end: ie
> d[8:0 = c[7:0] & 1'b0.
>
> Is this correct?

Rather than answer that, I'm just going to point out that there's not a
1:1 mapping between Q8.8 and Q7.9 types. So for a good part of the range
of your Q8.8 type you can only approximate the value in Q7.9. So the
question becomes not "is this correct?" but "is this right for my
application?" -- and you know what your application is.

Me, I'd append a zero to the end and I'd saturate to +/- full range (or
to +63.etc and -63.etc -- allowing the b100000... into a signed twos
compliment type gives you a tiny corner case that attracts a huge amount
of nasty bugs).

> Hi
>
> I have a question regarding Fixed point Arithmetic addition.
>
> For example, i have two fixed point numbers:
>
> a = unsigned Q7.8 format (7-bit integer, 8 bit factional).
> b = unsigned Q7.8 format ( " " ).
>
> Now a + b = c, where c is an unsigned Q8.8 result.
>
> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??

There is no way in general to do this conversion and avoid some kind of
nonlinear effect since the range of Q7.9 is smaller than Q8.8. The most
obvious method would be to saturate the Q8.8 result to Q7.9.

It would be good to know the reason why you're trying to rescale in this
manner - there may be a better way to do things from a higher level
point-of-view.
--
Randy Yates % "The dreamer, the unwoken fool -
Digital Signal Labs % in dreams, no pain will kiss the brow..."
mailto://[email protected] % http://www.digitalsignallabs.com % 'Eldorado Overture', *Eldorado*, ELO

> You can't on a 16-bit machine, but if you're working in an FPGA or custom
> logic a 17-bit type is no problem.

What do you suppose the OP's context is?

Jerry
--
Engineering is the art of making what you want from things you can get.
Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯ Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯ Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯

Randy Yates wrote:
> thunder <[email protected]> writes:
>
>> Hi
>>
>> I have a question regarding Fixed point Arithmetic addition.
>>
>> For example, i have two fixed point numbers:
>>
>> a = unsigned Q7.8 format (7-bit integer, 8 bit factional).
>> b = unsigned Q7.8 format ( " " ).
>>
>> Now a + b = c, where c is an unsigned Q8.8 result.
>>
>> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
>
> There is no way in general to do this conversion and avoid some kind of
> nonlinear effect since the range of Q7.9 is smaller than Q8.8. The most
> obvious method would be to saturate the Q8.8 result to Q7.9.
>
> It would be good to know the reason why you're trying to rescale in this
> manner - there may be a better way to do things from a higher level
> point-of-view.

I suggest that the give thunder time to glean an understanding from your
monograph, then ask for whatever further clarification he still needs.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

On Mon, 26 Oct 2009 12:27:56 -0400, Jerry Avins wrote:

> Tim Wescott wrote:
>
> ...
>
>> You can't on a 16-bit machine, but if you're working in an FPGA or
>> custom logic a 17-bit type is no problem.
>
> What do you suppose the OP's context is?
>
Homework, but I'm trying not to be ruled by assumptions.

>On Mon, 26 Oct 2009 10:53:23 -0400, Jerry Avins wrote:
>
>> thunder wrote:
>>> Hi
>>>
>>> I have a question regarding Fixed point Arithmetic addition.
>>>
>>> For example, i have two fixed point numbers:
>>>
>>> a = unsigned Q7.8 format (7-bit integer, 8 bit factional). b
unsigned
>>> Q7.8 format ( " " ).
>>>
>>> Now a + b = c, where c is an unsigned Q8.8 result.
>>
>> Then there is overflow, just as two Q15.0 integers and getting a Q16.0
>> sum. (Remember the sign bit.)
>>
>>> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
>>
>> You can't. Count the bits. (Remember the sign bit.)
>>
>You can't on a 16-bit machine, but if you're working in an FPGA or custo

>logic a 17-bit type is no problem.

Not entirely true (carry flag), but I'm splitting hairs, since the proble
is misguided. BTW, the OP repeatedly said unsigned, though it may hav
been confusing with repeated references to the MSbit.

There's absolutely no point in switching to 7.9 midstream. 7.8 hold
exactly the same information as 7.9 after *this* operation; the loss is i
the integer part, not the fractional part. Adding two unsigned 15 bi
numbers could probably be achieved with exactly the same opcode, becaus
the result merely has to be interpreted correctly (the same is not true o
multiplication, of course). For this problem, I would use all 16 bits th
whole time, not 15. You then have to choose whether to saturate or wra
around. There may be some processors that support saturation as a
instruction, but I think you'd otherwise have to look at the carry flag
this should be trivial for unsigned addition. To wrap, do nothin
(assuming other constraints don't prevent you from using all 16, els
mask).

Tim Wescott wrote:
> On Mon, 26 Oct 2009 12:27:56 -0400, Jerry Avins wrote:
>
>> Tim Wescott wrote:
>>
>> ...
>>
>>> You can't on a 16-bit machine, but if you're working in an FPGA or
>>> custom logic a 17-bit type is no problem.
>> What do you suppose the OP's context is?
>>
> Homework, but I'm trying not to be ruled by assumptions.

You're a better man than I am!

Jerry
--
Engineering is the art of making what you want from things you can get.
Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯ Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯ Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯Â¯

On Mon, 26 Oct 2009 14:41:38 -0400, Jerry Avins wrote:

> Tim Wescott wrote:
>> On Mon, 26 Oct 2009 12:27:56 -0400, Jerry Avins wrote:
>>
>>> Tim Wescott wrote:
>>>
>>> ...
>>>
>>>> You can't on a 16-bit machine, but if you're working in an FPGA or
>>>> custom logic a 17-bit type is no problem.
>>> What do you suppose the OP's context is?
>>>
>> Homework, but I'm trying not to be ruled by assumptions.
>
> You're a better man than I am!
>
> Jerry

On 26 Oct, 16:25, Randy Yates <[email protected]> wrote:
> thunder <[email protected]> writes:
> > Hi
>
> > I have a question regarding Fixed point Arithmetic addition.
>
> > For example, i have two fixed point numbers:
>
> > a = unsigned Q7.8 format (7-bit integer, 8 bit factional).
> > b = unsigned Q7.8 format ( * *" * * * * * * * * * * " * * * * * *).
>
> > Now a + b = c, where c is an unsigned Q8.8 result.
>
> > Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
>
> There is no way in general to do this conversion and avoid some kind of
> nonlinear effect since the range of Q7.9 is smaller than Q8.8. *The most
> obvious method would be to saturate the Q8.8 result to Q7.9.
>
> It would be good to know the reason why you're trying to rescale in this
> manner - there may be a better way to do things from a higher level
> point-of-view.
> --
> Randy Yates * * * * * * * * * * *% "The dreamer, the unwoken fool -
> Digital Signal Labs * * * * * * *% *in dreams, no pain will kiss the brow..."
> mailto://[email protected] * * * * *% *http://www.digitalsignallabs.com% 'Eldorado Overture', *Eldorado*, ELO

Hello

Thanks for all your answers.

This is not a homeowrk question, but some actual compute engine that i
am trying to design.

It turns out that there was a misunderstanding between what i thought
the C-model for this compute engine was modeling and the actual
implementation in the C-model.

My initial understanding was that the C-model was adding two signed
Q7.8 numbers (result being a signed Q8.8 number) and then somehow
quantising it to signed Q7.9 value. After having posted the question
to the newsgroup, the soultions i had mentioned in my post did not
seem to be instinctively correct. Thus my next approach was to turn
the signed Q8.8 into a saturated Q7.8 value and then add a '0' to the
LSB to make it into a signed Q7.9 value.

As it turns out, after discussion with the s/w person buliding the C-
model, they are not doing the signed Q8.8 to signed Q7.9
transformation. The C-model is adding two signed Q7.9 values to
generate a signed Q8.9 value and then adding a third signed Q7.9 value
to get a final result of signed Q9.9 value.

Thankfully thus i don't have to worry about the transformation.

Thanks all once again for all your helpful answers.

>On Mon, 26 Oct 2009 10:53:23 -0400, Jerry Avins wrote:
>
>> thunder wrote:
>>> Hi
>>>
>>> I have a question regarding Fixed point Arithmetic addition.
>>>
>>> For example, i have two fixed point numbers:
>>>
>>> a = unsigned Q7.8 format (7-bit integer, 8 bit factional). b
unsigned
>>> Q7.8 format ( " " ).
>>>
>>> Now a + b = c, where c is an unsigned Q8.8 result.
>>
>> Then there is overflow, just as two Q15.0 integers and getting a Q16.0
>> sum. (Remember the sign bit.)
>>
>>> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
>>
>> You can't. Count the bits. (Remember the sign bit.)
>>
>You can't on a 16-bit machine, but if you're working in an FPGA or custo

>logic a 17-bit type is no problem.

True, but he specifically said Q7.9, and 7 + 9 was 16 the last time
checked. :-)
>
>> ...
>>
>>> QS; Can anyone recommend a good book on Fixed Point and Floatin
point
>>> arithmetic ?
>>
>> http://www.digitalsignallabs.com/fp.pdf
>>
>> Jerry