FPGA Central

#1 (**permalink**) 06-21-2009, 11:27 PM

I needed a multiplier in a small footprint and Booth's Algorithm
seemed the appropriate approach since the numbers were signed. I
looked up multiple references on the web (isn't that where all
knowledge is to be found?) and to make sure I understood the
algorithm, I jimmied up a version is Excel. I thought I was having a
problem with the way I was building it Excel (it really shouldn't be
anyone's first choice for this sort of task...) as I could not get it
to work for all inputs. I could get it to work for all multiplicands
other than -2**n-1. Finally, I realized that ***NONE*** of the many
examples I found on the web were correct! In order to work for all
inputs, the implementation has to allow for a -2**n-1 to be negated to
2**n-1 which requires an extra bit in the arithmetic!

Some examples do it all numerically, so you don't see this at all.
Others even combine the multiplier with the product to save register
bits, still, they don't add the extra bit at the msb. I guess they
never tried their examples with -2**n-1 as a multiplicand...

It took me quite a while to realize this was the problem and not my
simulation. Is this extra bit obvious to everyone or am I just a bit
slower than most?

Rick

#2 (**permalink**) 06-21-2009, 11:35 PM

rickman <[email protected]> wrote:
(snip)

< I could get it to work for all multiplicands
< other than -2**n-1. Finally, I realized that ***NONE*** of the many
< examples I found on the web were correct! In order to work for all
< inputs, the implementation has to allow for a -2**n-1 to be negated to
< 2**n-1 which requires an extra bit in the arithmetic!

I presume you mean -2**(n-1).

< Some examples do it all numerically, so you don't see this at all.
< Others even combine the multiplier with the product to save register
< bits, still, they don't add the extra bit at the msb. I guess they
< never tried their examples with -2**n-1 as a multiplicand...

< It took me quite a while to realize this was the problem and not my
< simulation. Is this extra bit obvious to everyone or am I just a bit
< slower than most?

I thought the case where you need the extra bit is when multiplying
-2**(n-1) by itself. Anyway, yes that needs one additional product
bit that no other case needs. Other than the extra bit, though,
I hadn't thought you needed a special case for it.

-- glen

#3 (**permalink**) 06-22-2009, 04:28 AM

On Sun, 21 Jun 2009 15:27:09 -0700 (PDT), rickman <[email protected]>
wrote:

>I needed a multiplier in a small footprint and Booth's Algorithm
>seemed the appropriate approach since the numbers were signed. I
>looked up multiple references on the web (isn't that where all
>knowledge is to be found?) and to make sure I understood the
>algorithm, I jimmied up a version is Excel. I thought I was having a
>problem with the way I was building it Excel (it really shouldn't be
>anyone's first choice for this sort of task...) as I could not get it
>to work for all inputs. I could get it to work for all multiplicands
>other than -2**n-1. Finally, I realized that ***NONE*** of the many
>examples I found on the web were correct! In order to work for all
>inputs, the implementation has to allow for a -2**n-1 to be negated to
>2**n-1 which requires an extra bit in the arithmetic!
>

Assuming n is the number of bits of the inputs I think you mean
-2**(n-1) right? ie -8 = -2**3 is the most negative 4 bit two's
complement number. When two n bit numbers are multiplied you get a 2n
bit number. Continuing with our example, the number pair which
generates a sign inversion and largest magnitute change is -8 x -8 or
+64 ie 1000 x 1000 = 01000000. So the result of most negative n bit
number squared fits into the 2n bit result. So far I don't see a need
for any extra bits needed but I maybe missing something from your
description.

Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

#4 (**permalink**) 06-22-2009, 05:24 AM

On Jun 21, 8:28 pm, Muzaffer Kal <[email protected]> wrote:
> On Sun, 21 Jun 2009 15:27:09 -0700 (PDT), rickman <[email protected]>
> wrote:
>
> >I needed a multiplier in a small footprint and Booth's Algorithm
> >seemed the appropriate approach since the numbers were signed. I
> >looked up multiple references on the web (isn't that where all
> >knowledge is to be found?) and to make sure I understood the
> >algorithm, I jimmied up a version is Excel. I thought I was having a
> >problem with the way I was building it Excel (it really shouldn't be
> >anyone's first choice for this sort of task...) as I could not get it
> >to work for all inputs. I could get it to work for all multiplicands
> >other than -2**n-1. Finally, I realized that ***NONE*** of the many
> >examples I found on the web were correct! In order to work for all
> >inputs, the implementation has to allow for a -2**n-1 to be negated to
> >2**n-1 which requires an extra bit in the arithmetic!
>
> Assuming n is the number of bits of the inputs I think you mean
> -2**(n-1) right? ie -8 = -2**3 is the most negative 4 bit two's
> complement number. When two n bit numbers are multiplied you get a 2n
> bit number. Continuing with our example, the number pair which
> generates a sign inversion and largest magnitute change is -8 x -8 or
> +64 ie 1000 x 1000 = 01000000. So the result of most negative n bit
> number squared fits into the 2n bit result. So far I don't see a need
> for any extra bits needed but I maybe missing something from your
> description.
>
> Muzaffer Kal
>
> DSPIA INC.
> ASIC/FPGA Design Serviceshttp://www.dspia.com

Hi Rick,
The following context may give you a help you may need.

Find 3 × -4, with m = 3 and r = -4, and x = 4 and y = 4:

A = 0011 0000 0
S = 1101 0000 0
P = 0000 1100 0
Perform the loop four times :
P = 0000 1100 0. The last two bits are 00.
P = 0000 0110 0. Arithmetic right shift.
P = 0000 0110 0. The last two bits are 00.
P = 0000 0011 0. Arithmetic right shift.
P = 0000 0011 0. The last two bits are 10.
P = 1101 0011 0. P = P + S.
P = 1110 1001 1. Arithmetic right shift.
P = 1110 1001 1. The last two bits are 11.
P = 1111 0100 1. Arithmetic right shift.
The product is 1111 0100, which is -12.
The above mentioned technique is inadequate when the multiplicand is
the largest negative number that can be represented (i.e. if the
multiplicand has 8 bits then this value is -128). One possible
correction to this problem is to add one more bit to the left of A, S
and P. Below, we demonstrate the improved technique by multiplying -8
by 2 using 4 bits for the multiplicand and the multiplier:

A = 1 1000 0000 0
S = 0 1000 0000 0
P = 0 0000 0010 0
Perform the loop four times :
P = 0 0000 0010 0. The last two bits are 00.
P = 0 0000 0001 0. Right shift.
P = 0 0000 0001 0. The last two bits are 10.
P = 0 1000 0001 0. P = P + S.
P = 0 0100 0000 1. Right shift.
P = 0 0100 0000 1. The last two bits are 01.
P = 1 1100 0000 1. P = P + A.
P = 1 1110 0000 0. Right shift.
P = 1 1110 0000 0. The last two bits are 00.
P = 0 1111 0000 0. Right shift.
The product is 11110000 (after discarding the first and the last bit)
which is -16.

It is copied from http://en.wikipedia.org/wiki/Booth_algorithm.

There is no modified Booth algorithm in wikipedia and you may have to
figure out how to adopt it to modified Booth algorithm.

Weng

#5 (**permalink**) 06-22-2009, 09:22 AM

On Sun, 21 Jun 2009 21:24:04 -0700 (PDT), Weng Tianxiang
<[email protected]> wrote:

>On Jun 21, 8:28 pm, Muzaffer Kal <[email protected]> wrote:
>> On Sun, 21 Jun 2009 15:27:09 -0700 (PDT), rickman <[email protected]>
>> wrote:
>>
>> >I needed a multiplier in a small footprint and Booth's Algorithm
>> >seemed the appropriate approach since the numbers were signed. I
>> >looked up multiple references on the web (isn't that where all
>> >knowledge is to be found?) and to make sure I understood the
>> >algorithm, I jimmied up a version is Excel. I thought I was having a
>> >problem with the way I was building it Excel (it really shouldn't be
>> >anyone's first choice for this sort of task...) as I could not get it
>> >to work for all inputs. I could get it to work for all multiplicands
>> >other than -2**n-1. Finally, I realized that ***NONE*** of the many
>> >examples I found on the web were correct! In order to work for all
>> >inputs, the implementation has to allow for a -2**n-1 to be negated to
>> >2**n-1 which requires an extra bit in the arithmetic!
>>
>> Assuming n is the number of bits of the inputs I think you mean
>> -2**(n-1) right? ie -8 = -2**3 is the most negative 4 bit two's
>> complement number. When two n bit numbers are multiplied you get a 2n
>> bit number. Continuing with our example, the number pair which
>> generates a sign inversion and largest magnitute change is -8 x -8 or
>> +64 ie 1000 x 1000 = 01000000. So the result of most negative n bit
>> number squared fits into the 2n bit result. So far I don't see a need
>> for any extra bits needed but I maybe missing something from your
>> description.
>>
>> Muzaffer Kal
>>
>> DSPIA INC.
>> ASIC/FPGA Design Serviceshttp://www.dspia.com
>
>Hi Rick,
>The following context may give you a help you may need.
....
>It is copied from http://en.wikipedia.org/wiki/Booth_algorithm.

The problem with the initial algorithm given in the above url is step
1.2. In two's complement representation the valid range is -2**(n-1)
to 2**n-1 (ie [-8...7] for 4 bit numbers) so -m does not fit into x
bits when m = -2**(n-1) (ie -8) as -m would be 2**n (+8). It's nice
that wikipedia has a correction to it.
The issue with Boothm multiplication as stated is the caveat given at
the end of the url: "When the ones in a multiplier are grouped into
long blocks", then "Booth's algorithm performs fewer additions and
subtractions than the normal multiplication algorithm." In the worst
case this algorithm has the same behavior as any other. I think it's
better to use a booth encoder and halve the additions
deterministically, especially when implementing in hardware.
---
Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

#6 (**permalink**) 06-22-2009, 04:45 PM

On Jun 22, 1:22*am, Muzaffer Kal <[email protected]> wrote:
> On Sun, 21 Jun 2009 21:24:04 -0700 (PDT), Weng Tianxiang
>
>
>
>
>
>
>
> <[email protected]> wrote:
> >On Jun 21, 8:28 pm, Muzaffer Kal <[email protected]> wrote:
> >> On Sun, 21 Jun 2009 15:27:09 -0700 (PDT), rickman <[email protected]>
> >> wrote:
>
> >> >I needed a multiplier in a small footprint and Booth's Algorithm
> >> >seemed the appropriate approach since the numbers were signed. *I
> >> >looked up multiple references on the web (isn't that where all
> >> >knowledge is to be found?) and to make sure I understood the
> >> >algorithm, I jimmied up a version is Excel. *I thought I was havinga
> >> >problem with the way I was building it Excel (it really shouldn't be
> >> >anyone's first choice for this sort of task...) as I could not get it
> >> >to work for all inputs. *I could get it to work for all multiplicands
> >> >other than -2**n-1. *Finally, I realized that ***NONE*** of the many
> >> >examples I found on the web were correct! *In order to work for all
> >> >inputs, the implementation has to allow for a -2**n-1 to be negated to
> >> >2**n-1 which requires an extra bit in the arithmetic!
>
> >> Assuming n is the number of bits of the inputs I think you mean
> >> -2**(n-1) right? ie -8 = -2**3 is the most negative 4 bit two's
> >> complement number. When two n bit numbers are multiplied you get a 2n
> >> bit number. Continuing with our example, the number pair which
> >> generates a sign inversion and largest magnitute change is -8 x -8 or
> >> +64 ie 1000 x 1000 = 01000000. So the result of most negative n bit
> >> number squared fits into the 2n bit result. So far I don't see a need
> >> for any extra bits needed but I maybe missing something from your
> >> description.
>
> >> Muzaffer Kal
>
> >> DSPIA INC.
> >> ASIC/FPGA Design Serviceshttp://www.dspia.com
>
> >Hi Rick,
> >The following context may give you a help you may need.
> ...
> >It is copied fromhttp://en.wikipedia.org/wiki/Booth_algorithm.
>
> The problem with the initial algorithm given in the above url is *step
> 1.2. In two's complement representation the valid range is -2**(n-1)
> to 2**n-1 (ie [-8...7] for 4 bit numbers) so -m does not fit into x
> bits when m = -2**(n-1) (ie -8) as -m would be 2**n (+8). It's nice
> that wikipedia has a correction to it.
> The issue with Boothm multiplication as stated is the caveat given at
> the end of the url: "When the ones in a multiplier are grouped into
> long blocks", then "Booth's algorithm performs fewer additions and
> subtractions than the normal multiplication algorithm." In the worst
> case this algorithm has the same behavior as any other. I think it's
> better to use a booth encoder and halve the additions
> deterministically, especially when implementing in hardware.
> ---
> Muzaffer Kal
>
> DSPIA INC.
> ASIC/FPGA Design Serviceshttp://www.dspia.com- Hide quoted text -
>
> - Show quoted text -

Hi Muzaffer,
Please compare modified Booth method with the method used by Xilinx in
the following patent:

The method used by Xilinx has the same advantage as modified Booth
method: half the number of multiplicants, but with simpler logic
connections.
http://www.google.com/patents/about?...=0&as_maxy_is=

My opinion about modified Booth method is: it is designed for ASIC,
not for FPGA. Within FPGA, there are lot of solutions which is as good
as modified Booth method.

Weng

#7 (**permalink**) 06-22-2009, 05:59 PM

rickman wrote:
> I needed a multiplier in a small footprint and Booth's Algorithm
> seemed the appropriate approach since the numbers were signed.

I expect that

c <= ieee.numeric_std."*"(signed(a),signed(b));

would work just as well for fewer brain cycles.

-- Mike Treseler

#8 (**permalink**) 06-22-2009, 06:29 PM

Muzaffer Kal wrote:
> On Sun, 21 Jun 2009 21:24:04 -0700 (PDT), Weng Tianxiang
> <[email protected]> wrote:
>
>> On Jun 21, 8:28 pm, Muzaffer Kal <[email protected]> wrote:
>>> On Sun, 21 Jun 2009 15:27:09 -0700 (PDT), rickman <[email protected]>
>>> wrote:
>>>
>>>> I needed a multiplier in a small footprint and Booth's Algorithm
>>>> seemed the appropriate approach since the numbers were signed. I
>>>> looked up multiple references on the web (isn't that where all
>>>> knowledge is to be found?) and to make sure I understood the
>>>> algorithm, I jimmied up a version is Excel. I thought I was having a
>>>> problem with the way I was building it Excel (it really shouldn't be
>>>> anyone's first choice for this sort of task...) as I could not get it
>>>> to work for all inputs. I could get it to work for all multiplicands
>>>> other than -2**n-1. Finally, I realized that ***NONE*** of the many
>>>> examples I found on the web were correct! In order to work for all
>>>> inputs, the implementation has to allow for a -2**n-1 to be negated to
>>>> 2**n-1 which requires an extra bit in the arithmetic!
>>> Assuming n is the number of bits of the inputs I think you mean
>>> -2**(n-1) right? ie -8 = -2**3 is the most negative 4 bit two's
>>> complement number. When two n bit numbers are multiplied you get a 2n
>>> bit number. Continuing with our example, the number pair which
>>> generates a sign inversion and largest magnitute change is -8 x -8 or
>>> +64 ie 1000 x 1000 = 01000000. So the result of most negative n bit
>>> number squared fits into the 2n bit result. So far I don't see a need
>>> for any extra bits needed but I maybe missing something from your
>>> description.
>>>
>>> Muzaffer Kal
>>>
>>> DSPIA INC.
>>> ASIC/FPGA Design Serviceshttp://www.dspia.com
>> Hi Rick,
>> The following context may give you a help you may need.
> ....
>> It is copied from http://en.wikipedia.org/wiki/Booth_algorithm.
>
> The problem with the initial algorithm given in the above url is step
> 1.2. In two's complement representation the valid range is -2**(n-1)
> to 2**n-1 (ie [-8...7] for 4 bit numbers) so -m does not fit into x
> bits when m = -2**(n-1) (ie -8) as -m would be 2**n (+8). It's nice
> that wikipedia has a correction to it.

http://www.google.co.uk/search?pz=1&...Search+the+Web
This gives some further hints. I need to read this carefully before
commenting.

The wiki page does refer to a book which may give additional information.

I need to convert a shift add unsigned multiplier into a booth
unsigned/signed multiplier for a home project. I will get a solution but
not that quickly because it is a home project and I'm still learning. Andy

> ---
> Muzaffer Kal
>
> DSPIA INC.
> ASIC/FPGA Design Services
> http://www.dspia.com

#9 (**permalink**) 06-22-2009, 07:10 PM

On Mon, 22 Jun 2009 09:59:33 -0700, Mike Treseler
<[email protected]> wrote:

>rickman wrote:
>> I needed a multiplier in a small footprint and Booth's Algorithm
>> seemed the appropriate approach since the numbers were signed.
>
>I expect that
>
> c <= ieee.numeric_std."*"(signed(a),signed(b));
>
>would work just as well for fewer brain cycles.

But probably with more gates. This would generate a parallel
multiplier and a shift-add/sub type multiplier can be made smaller by
trading latency for area.
-
Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services

http://www.dspia.com

#10 (**permalink**) 06-22-2009, 07:42 PM

On Jun 22, 11:10*am, Muzaffer Kal <[email protected]> wrote:
> On Mon, 22 Jun 2009 09:59:33 -0700, Mike Treseler
>
> <[email protected]> wrote:
> >rickman wrote:
> >> I needed a multiplier in a small footprint and Booth's Algorithm
> >> seemed the appropriate approach since the numbers were signed.
>
> >I expect that
>
> > c <= ieee.numeric_std."*"(signed(a),signed(b));
>
> >would work just as well for fewer brain cycles.
>
> But probably with more gates. This would generate a parallel
> multiplier and a shift-add/sub type multiplier can be made smaller by
> trading latency for area.
> -
> Muzaffer Kal
>
> DSPIA INC.
> ASIC/FPGA Design Services
>
> http://www.dspia.com

Hi Mike,
I have a question about multiplier. Given the fact that both Altera
and Xilinx have multipliers built in FPGA chip, why is there a big
interesting to constitute a multiplier from square one which seems to
be very slow.

I'v never use a multiplier in my design, so I would like to know on
what conditions a FPGA user would like to built a multiplier. After
reading the FPGA handbook I thought multiplication never is a problem
with built-in multiplier.

Weng

#11 (**permalink**) 06-22-2009, 07:45 PM

On Mon, 22 Jun 2009 11:42:29 -0700 (PDT), Weng Tianxiang wrote:
>I have a question about multiplier. Given the fact that both Altera
>and Xilinx have multipliers built in FPGA chip, why is there a big
>interesting to constitute a multiplier from square one which seems to
>be very slow.

Some reasons (I'm sure there are more):
1) you might run out of embedded multipliers;
2) you might need a very fast multiplier - faster than
the builtins - for narrow data words;
3) you might be using a device that lacks multipliers
(there still are some of those);
4) you are aiming for portability to non-FPGA technologies.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
[email protected]
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

#12 (**permalink**) 06-22-2009, 08:12 PM

Muzaffer Kal wrote:

> But probably with more gates.

That depends on the target and how clever the tool is.
This could be an fpga with an unused DSP block.

-- Mike Treseler

#13 (**permalink**) 06-22-2009, 09:59 PM

On Jun 22, 12:59*pm, Mike Treseler <[email protected]> wrote:
> rickman wrote:
> > I needed a multiplier in a small footprint and Booth's Algorithm
> > seemed the appropriate approach since the numbers were signed.
>
> I expect that
>
> *c <= ieee.numeric_std."*"(signed(a),signed(b));
>
> would work just as well for fewer brain cycles.
>
> * * * *-- Mike Treseler

Did you miss the part about "small footprint"? I used that in an
earlier design and it uses about 300 LUTs for the multiplier. I have
lots of clock cycles, so I can generate the partial products
sequentially using much less logic.

Rick

#14 (**permalink**) 06-22-2009, 10:00 PM

On Jun 22, 1:29 pm, Andy Botterill <[email protected]> wrote:
> Muzaffer Kal wrote:
> > On Sun, 21 Jun 2009 21:24:04 -0700 (PDT), Weng Tianxiang
> > <[email protected]> wrote:
>
> >> On Jun 21, 8:28 pm, Muzaffer Kal <[email protected]> wrote:
> >>> On Sun, 21 Jun 2009 15:27:09 -0700 (PDT), rickman <[email protected]>
> >>> wrote:
>
> >>>> I needed a multiplier in a small footprint and Booth's Algorithm
> >>>> seemed the appropriate approach since the numbers were signed. I
> >>>> looked up multiple references on the web (isn't that where all
> >>>> knowledge is to be found?) and to make sure I understood the
> >>>> algorithm, I jimmied up a version is Excel. I thought I was having a
> >>>> problem with the way I was building it Excel (it really shouldn't be
> >>>> anyone's first choice for this sort of task...) as I could not get it
> >>>> to work for all inputs. I could get it to work for all multiplicands
> >>>> other than -2**n-1. Finally, I realized that ***NONE*** of the many
> >>>> examples I found on the web were correct! In order to work for all
> >>>> inputs, the implementation has to allow for a -2**n-1 to be negated to
> >>>> 2**n-1 which requires an extra bit in the arithmetic!
> >>> Assuming n is the number of bits of the inputs I think you mean
> >>> -2**(n-1) right? ie -8 = -2**3 is the most negative 4 bit two's
> >>> complement number. When two n bit numbers are multiplied you get a 2n
> >>> bit number. Continuing with our example, the number pair which
> >>> generates a sign inversion and largest magnitute change is -8 x -8 or
> >>> +64 ie 1000 x 1000 = 01000000. So the result of most negative n bit
> >>> number squared fits into the 2n bit result. So far I don't see a need
> >>> for any extra bits needed but I maybe missing something from your
> >>> description.
>
> >>> Muzaffer Kal
>
> >>> DSPIA INC.
> >>> ASIC/FPGA Design Serviceshttp://www.dspia.com
> >> Hi Rick,
> >> The following context may give you a help you may need.
> > ....
> >> It is copied fromhttp://en.wikipedia.org/wiki/Booth_algorithm.
>
> > The problem with the initial algorithm given in the above url is step
> > 1.2. In two's complement representation the valid range is -2**(n-1)
> > to 2**n-1 (ie [-8...7] for 4 bit numbers) so -m does not fit into x
> > bits when m = -2**(n-1) (ie -8) as -m would be 2**n (+8). It's nice
> > that wikipedia has a correction to it.
>
> http://www.google.co.uk/search?pz=1&...h+samavi&btnme...
> This gives some further hints. I need to read this carefully before
> commenting.
>
> The wiki page does refer to a book which may give additional information.
>
> I need to convert a shift add unsigned multiplier into a booth
> unsigned/signed multiplier for a home project. I will get a solution but
> not that quickly because it is a home project and I'm still learning. Andy

I've been looking at this pretty hard and I have come to the
conclusion that a shift and add uses fewer resources in an FPGA. Why
do you feel you need to use Booth's algorithm?

If you are saying that you need a signed multiplier, you can use the
shift and add just fine. Booth's requires the multiplier stage to
include an adder to negate the multiplicand in addition to the adder
to form the partial product. This is repeated for each partial
product. The shift and add algorithm can work for signed number by
using the sign of the multiplier to negate both the multiplier and
multiplicand. The adder which outputs the partial product can be made
to either add or subtract, implementing the negate for the
multiplicand. If the shift and add multiplier is implementing all
stages, the negate circuit for the multiplier is only implemented once
instead of for each partial product stage. If the design is using the
same circuit sequentially, the negate of the multiplier can be done
bit serially. Either way the shift and add uses less resources than
Booth's algorithm.

Rick

#15 (**permalink**) 06-22-2009, 11:33 PM

rickman wrote:

> I used that in an
> earlier design and it uses about 300 LUTs for the multiplier. I have
> lots of clock cycles, so I can generate the partial products
> sequentially using much less logic.

Less than 200 LUTs?

-- Mike Treseler

#16 (**permalink**) 06-23-2009, 05:51 AM

On Jun 22, 6:33*pm, Mike Treseler <[email protected]> wrote:
> rickman wrote:
> > I used that in an
> > earlier design and it uses about 300 LUTs for the multiplier. *I have
> > lots of clock cycles, so I can generate the partial products
> > sequentially using much less logic.
>
> Less than 200 LUTs?
>
> * * * -- Mike Treseler

Are you joking? I expect it to be on the order of 40 LUTs for a 16 x
16 multiplier. Maybe I am not making myself clear. I am calculating
one partial product at a time and adding them to the product
sequentially using the same hardware, but multiple clock cycles. Sort
of like a bit serial adder, but this is a partial product serial
multiplier or maybe you could call it a multiplier bit serial
multiplier. A truely bit serial multiplier could be done in a dozen
to twenty LUTs I expect, but a 16x16 multiply would take 256 clock
cycles.

I actually considered this on an earlier design, but it turned out I
didn't need to push on the LUTs used and the X*Y multiplier took some
300 LUTs. So going down to 40 or 50 LUTs is a big improvement. I
should have this done later this week and I'll post the results.

Rick

#17 (**permalink**) 06-23-2009, 06:09 AM

rickman wrote:
> On Jun 22, 1:29 pm, Andy Botterill <[email protected]> wrote:
>> Muzaffer Kal wrote:
>>> On Sun, 21 Jun 2009 21:24:04 -0700 (PDT), Weng Tianxiang
>>> <[email protected]> wrote:
>>>> On Jun 21, 8:28 pm, Muzaffer Kal <[email protected]> wrote:
>>>>> On Sun, 21 Jun 2009 15:27:09 -0700 (PDT), rickman <[email protected]>
>>>>> wrote:
>>>>>> I needed a multiplier in a small footprint and Booth's Algorithm
>>>>>> seemed the appropriate approach since the numbers were signed. I
>>>>>> looked up multiple references on the web (isn't that where all
>>>>>> knowledge is to be found?) and to make sure I understood the
>>>>>> algorithm, I jimmied up a version is Excel. I thought I was having a
>>>>>> problem with the way I was building it Excel (it really shouldn't be
>>>>>> anyone's first choice for this sort of task...) as I could not get it
>>>>>> to work for all inputs. I could get it to work for all multiplicands
>>>>>> other than -2**n-1. Finally, I realized that ***NONE*** of the many
>>>>>> examples I found on the web were correct! In order to work for all
>>>>>> inputs, the implementation has to allow for a -2**n-1 to be negated to
>>>>>> 2**n-1 which requires an extra bit in the arithmetic!
>>>>> Assuming n is the number of bits of the inputs I think you mean
>>>>> -2**(n-1) right? ie -8 = -2**3 is the most negative 4 bit two's
>>>>> complement number. When two n bit numbers are multiplied you get a 2n
>>>>> bit number. Continuing with our example, the number pair which
>>>>> generates a sign inversion and largest magnitute change is -8 x -8 or
>>>>> +64 ie 1000 x 1000 = 01000000. So the result of most negative n bit
>>>>> number squared fits into the 2n bit result. So far I don't see a need
>>>>> for any extra bits needed but I maybe missing something from your
>>>>> description.
>>>>> Muzaffer Kal
>>>>> DSPIA INC.
>>>>> ASIC/FPGA Design Serviceshttp://www.dspia.com
>>>> Hi Rick,
>>>> The following context may give you a help you may need.
>>> ....
>>>> It is copied fromhttp://en.wikipedia.org/wiki/Booth_algorithm.
>>> The problem with the initial algorithm given in the above url is step
>>> 1.2. In two's complement representation the valid range is -2**(n-1)
>>> to 2**n-1 (ie [-8...7] for 4 bit numbers) so -m does not fit into x
>>> bits when m = -2**(n-1) (ie -8) as -m would be 2**n (+8). It's nice
>>> that wikipedia has a correction to it.
>> http://www.google.co.uk/search?pz=1&...h+samavi&btnme...
>> This gives some further hints. I need to read this carefully before
>> commenting.
>>
>> The wiki page does refer to a book which may give additional information.
>>
>> I need to convert a shift add unsigned multiplier into a booth
>> unsigned/signed multiplier for a home project. I will get a solution but
>> not that quickly because it is a home project and I'm still learning. Andy
>
> I've been looking at this pretty hard and I have come to the
> conclusion that a shift and add uses fewer resources in an FPGA. Why
> do you feel you need to use Booth's algorithm?
>
> If you are saying that you need a signed multiplier, you can use the
> shift and add just fine. Booth's requires the multiplier stage to

The shift and add multiplier was my first proper verilog for my home
project. This was designed as an unsigned multiplier and uses 8 adders
so that it can complete in 4 cycles.

Later on I realised that I needed a signed/unsigned multiplier done in 4
cycles. The only way that I think that I can get signed multiplication
is to double the data size from 32 to 64. That was suggested by a friend
in design. This increases the number of adders needed to complete in 4
cycles. It also changes the cycle taken for the unsigned multiplication.

Using booths gives me less area, correct cycles and signed/unsigned
operation. I've only got to do it....

> include an adder to negate the multiplicand in addition to the adder
> to form the partial product. This is repeated for each partial
> product. The shift and add algorithm can work for signed number by
> using the sign of the multiplier to negate both the multiplier and
> multiplicand. The adder which outputs the partial product can be made
> to either add or subtract, implementing the negate for the
> multiplicand. If the shift and add multiplier is implementing all
> stages, the negate circuit for the multiplier is only implemented once
> instead of for each partial product stage. If the design is using the
> same circuit sequentially, the negate of the multiplier can be done
> bit serially. Either way the shift and add uses less resources than
> Booth's algorithm.

The purpose of this home project is to prove to myself that I can do
design. The idea is to meet the functional requirements and cycle
requirements. Where I think that I can reduce the area I have. Using one
signed/unsigned multiplier will also simplify the fsm in another design
block. I think the simplification (reducing FSM states) should reduce
the design ara.

Thanks for the help. Andy
>
> Rick

#18 (**permalink**) 06-23-2009, 03:23 PM

On Jun 23, 12:51 am, rickman <[email protected]> wrote:
> On Jun 22, 6:33 pm, Mike Treseler <[email protected]> wrote:
>
> > rickman wrote:
> > > I used that in an
> > > earlier design and it uses about 300 LUTs for the multiplier. I have
> > > lots of clock cycles, so I can generate the partial products
> > > sequentially using much less logic.
>
> > Less than 200 LUTs?
>
> > -- Mike Treseler
>
> Are you joking? I expect it to be on the order of 40 LUTs for a 16 x
> 16 multiplier. Maybe I am not making myself clear. I am calculating
> one partial product at a time and adding them to the product
> sequentially using the same hardware, but multiple clock cycles. Sort
> of like a bit serial adder, but this is a partial product serial
> multiplier or maybe you could call it a multiplier bit serial
> multiplier. A truely bit serial multiplier could be done in a dozen
> to twenty LUTs I expect, but a 16x16 multiply would take 256 clock
> cycles.
>
> I actually considered this on an earlier design, but it turned out I
> didn't need to push on the LUTs used and the X*Y multiplier took some
> 300 LUTs. So going down to 40 or 50 LUTs is a big improvement. I
> should have this done later this week and I'll post the results.
>
> Rick

My estimate was off by a little as I forgot to account for the extra
rank of muxes required to load the initial values. So the total for a
16 x 16 add and shift sequential multiplier is 60 LUTs as reported by
the Lattice tools. This drops by about 10 LUTs if the counter is
removed. In my design the timing may be controlled by logic
elsewhere. Yes, somehow Synplify is using 10 LUTs to build a four bit
counter! It seems to want to duplicate three of the four FFs.

Rick

#19 (**permalink**) 06-23-2009, 07:56 PM

On Jun 23, 7:23*am, rickman <[email protected]> wrote:
> On Jun 23, 12:51 am, rickman <[email protected]> wrote:
>
>
>
>
>
> > On Jun 22, 6:33 pm, Mike Treseler <[email protected]> wrote:
>
> > > rickman wrote:
> > > > I used that in an
> > > > earlier design and it uses about 300 LUTs for the multiplier. *I have
> > > > lots of clock cycles, so I can generate the partial products
> > > > sequentially using much less logic.
>
> > > Less than 200 LUTs?
>
> > > * * * -- Mike Treseler
>
> > Are you joking? *I expect it to be on the order of 40 LUTs for a 16 x
> > 16 multiplier. *Maybe I am not making myself clear. *I am calculating
> > one partial product at a time and adding them to the product
> > sequentially using the same hardware, but multiple clock cycles. *Sort
> > of like a bit serial adder, but this is a partial product serial
> > multiplier or maybe you could call it a multiplier bit serial
> > multiplier. *A truely bit serial multiplier could be done in a dozen
> > to twenty LUTs I expect, but a 16x16 multiply would take 256 clock
> > cycles.
>
> > I actually considered this on an earlier design, but it turned out I
> > didn't need to push on the LUTs used and the X*Y multiplier took some
> > 300 LUTs. *So going down to 40 or 50 LUTs is a big improvement. *I
> > should have this done later this week and I'll post the results.
>
> > Rick
>
> My estimate was off by a little as I forgot to account for the extra
> rank of muxes required to load the initial values. *So the total for a
> 16 x 16 add and shift sequential multiplier is 60 LUTs as reported by
> the Lattice tools. *This drops by about 10 LUTs if the counter is
> removed. *In my design the timing may be controlled by logic
> elsewhere. *Yes, somehow Synplify is using 10 LUTs to build a four bit
> counter! *It seems to want to duplicate three of the four FFs.
>
> Rick- Hide quoted text -
>
> - Show quoted text -

Hi Rick,
Can you show me where I can find the schematics of Lattice doing 16x16
for 60 LUT?

Do 60 LUTs include 32-bit flip-flops to store output data?

Weng

#20 (**permalink**) 06-24-2009, 03:09 PM

On Jun 23, 2:56 pm, Weng Tianxiang <[email protected]> wrote:
> On Jun 23, 7:23 am, rickman <[email protected]> wrote:
>
>
>
> > On Jun 23, 12:51 am, rickman <[email protected]> wrote:
>
> > > On Jun 22, 6:33 pm, Mike Treseler <[email protected]> wrote:
>
> > > > rickman wrote:
> > > > > I used that in an
> > > > > earlier design and it uses about 300 LUTs for the multiplier. I have
> > > > > lots of clock cycles, so I can generate the partial products
> > > > > sequentially using much less logic.
>
> > > > Less than 200 LUTs?
>
> > > > -- Mike Treseler
>
> > > Are you joking? I expect it to be on the order of 40 LUTs for a 16 x
> > > 16 multiplier. Maybe I am not making myself clear. I am calculating
> > > one partial product at a time and adding them to the product
> > > sequentially using the same hardware, but multiple clock cycles. Sort
> > > of like a bit serial adder, but this is a partial product serial
> > > multiplier or maybe you could call it a multiplier bit serial
> > > multiplier. A truely bit serial multiplier could be done in a dozen
> > > to twenty LUTs I expect, but a 16x16 multiply would take 256 clock
> > > cycles.
>
> > > I actually considered this on an earlier design, but it turned out I
> > > didn't need to push on the LUTs used and the X*Y multiplier took some
> > > 300 LUTs. So going down to 40 or 50 LUTs is a big improvement. I
> > > should have this done later this week and I'll post the results.
>
> > > Rick
>
> > My estimate was off by a little as I forgot to account for the extra
> > rank of muxes required to load the initial values. So the total for a
> > 16 x 16 add and shift sequential multiplier is 60 LUTs as reported by
> > the Lattice tools. This drops by about 10 LUTs if the counter is
> > removed. In my design the timing may be controlled by logic
> > elsewhere. Yes, somehow Synplify is using 10 LUTs to build a four bit
> > counter! It seems to want to duplicate three of the four FFs.
>
> > Rick- Hide quoted text -
>
> > - Show quoted text -
>
> Hi Rick,
> Can you show me where I can find the schematics of Lattice doing 16x16
> for 60 LUT?
>
> Do 60 LUTs include 32-bit flip-flops to store output data?
>
> Weng

I was able to get a schematic from Synplify Pro, but it did not want
to include the text except on part of the drawing. If anyone can tell
me how to get the text to display, I will reprint it. The PDF file
can be found at http://arius.com/stuff/FPGA/Multiply_16x16.pdf

This incarnation uses 62 LUTs and 42 FFs if I am reading the info
correctly. This includes 32 FFs to hold the output product. The
Multiplicand is not registered, it is assumed that it is held constant
on the input. There are 7 FFs in the 4 bit counter (Synplify insists
on duplicating 3 bits as "fast" counter bits) one FF for the carry bit
on the Multiplier negate circuit and the lsb of the product register
is duplicated twice (it is used to control the adder). If the bit
counter is not duplicated, it would also reduce the LUT count by 4
LUTs.

Just to make this clear, this multiplier uses as many clock cycles to
produce a product as you have bits in the multiplier. This is not a
"fast" multiplier which can produce a result on each clock cycle and
can even be pipelined to run with as fast a clock as this circuit.
Also, I have not simulated it to be sure it is coded correctly, but I
am pretty confident it is working the way I intend or at least any
mistakes won't change the LUT count much.

Rick

#21 (**permalink**) 06-24-2009, 05:54 PM

rickman wrote:

> I was able to get a schematic from Synplify Pro, but it did not want
> to include the text except on part of the drawing. If anyone can tell
> me how to get the text to display, I will reprint it. The PDF file
> can be found at http://arius.com/stuff/FPGA/Multiply_16x16.pdf

Don't know about synplify.
Does it have an RTL viewer also?

> This incarnation uses 62 LUTs and 42 FFs if I am reading the info
> correctly. This includes 32 FFs to hold the output product. The
> Multiplicand is not registered, it is assumed that it is held constant
> on the input.

That should be ok if the enable is synchronized.
Interesting. Thanks for the posting.

> Also, I have not simulated it to be sure it is coded correctly, but I
> am pretty confident it is working the way I intend or at least any
> mistakes won't change the LUT count much.

I prefer to start with an RTL sim,
but I know that many designers prefer
working on the bench. Good luck.

-- Mike Treseler

#22 (**permalink**) 06-24-2009, 07:57 PM

On Jun 24, 9:54*am, Mike Treseler <[email protected]> wrote:
> rickman wrote:
> > I was able to get a schematic from Synplify Pro, but it did not want
> > to include the text except on part of the drawing. *If anyone can tell
> > me how to get the text to display, I will reprint it. *The PDF file
> > can be found athttp://arius.com/stuff/FPGA/Multiply_16x16.pdf
>
> Don't know about synplify.
> Does it have an RTL viewer also?
>
> > This incarnation uses 62 LUTs and 42 FFs if I am reading the info
> > correctly. *This includes 32 FFs to hold the output product. *The
> > Multiplicand is not registered, it is assumed that it is held constant
> > on the input.
>
> That should be ok if the enable is synchronized.
> Interesting. Thanks for the posting.
>
> > Also, I have not simulated it to be sure it is coded correctly, but I
> > am pretty confident it is working the way I intend or at least any
> > mistakes won't change the LUT count much.
>
> I prefer to start with an RTL sim,
> but I know that many designers prefer
> working on the bench. Good luck.
>
> * * * * -- Mike Treseler

Hi Rick,
The file published at http://arius.com/stuff/FPGA/Multiply_16x16.pdf
has some flaws:
I cannot see the texts after 4 clicks in width direction with 400%
magnification. It means the first 3 clicks in width show normal
drawings, but after that only schematics are shown normally, but no
texts are shown.

Is it normal?

Weng

#23 (**permalink**) 06-25-2009, 04:00 AM

On Jun 24, 2:57*pm, Weng Tianxiang <[email protected]> wrote:
> On Jun 24, 9:54*am, Mike Treseler <[email protected]> wrote:
>
>
>
> > rickman wrote:
> > > I was able to get a schematic from Synplify Pro, but it did not want
> > > to include the text except on part of the drawing. *If anyone can tell
> > > me how to get the text to display, I will reprint it. *The PDF file
> > > can be found athttp://arius.com/stuff/FPGA/Multiply_16x16.pdf
>
> > Don't know about synplify.
> > Does it have an RTL viewer also?
>
> > > This incarnation uses 62 LUTs and 42 FFs if I am reading the info
> > > correctly. *This includes 32 FFs to hold the output product. *The
> > > Multiplicand is not registered, it is assumed that it is held constant
> > > on the input.
>
> > That should be ok if the enable is synchronized.
> > Interesting. Thanks for the posting.
>
> > > Also, I have not simulated it to be sure it is coded correctly, but I
> > > am pretty confident it is working the way I intend or at least any
> > > mistakes won't change the LUT count much.
>
> > I prefer to start with an RTL sim,
> > but I know that many designers prefer
> > working on the bench. Good luck.
>
> > * * * * -- Mike Treseler
>
> Hi Rick,
> The file published athttp://arius.com/stuff/FPGA/Multiply_16x16.pdf
> has some flaws:
> I cannot see the texts after 4 clicks in width direction with 400%
> magnification. It means the first 3 clicks in width show normal
> drawings, but after that only schematics are shown normally, but no
> texts are shown.
>
> Is it normal?
>
> Weng

Yes, I know the print is not very good, but that is the best I could
get out of Synplify. The only way I could get any text at all was to
set the magnification to the lowest level that would still display
text. Then only the portion visible on my screen would print with
text. The rest shows nothing. The large squares are two bit adder
primitives and the smaller rectangles are FFs. The rest is pretty
obvious since they show the gates.

Like I said, if anyone knows how to get Synplify to print the entire
schematic with text, I'll be happy to post that. I can email you the
code if you would like to see that. But like I said, there is nothing
here that isn't pretty obvious other than that you need an extra msb
to preserve sign if you need to support the most negative number (-2**
(n-1)).

Rick

#24 (**permalink**) 07-05-2009, 03:14 PM

As I was surfing I noticed a reference to the ARM6 booth algorithm.
http://209.85.229.132/search?q=cache...&ct=clnk&gl=uk
In the appendix you get a C style algorithm. Whether you can copy this I
don't know. There is a reference to getting it from ARM at the bottom.
If this reduces the number of adders I will have a go at it. Not right
now sorry. Andy

#25 (**permalink**) 07-05-2009, 03:52 PM

On Jul 5, 10:14 am, Andy Botterill <[email protected]> wrote:
> As I was surfing I noticed a reference to the ARM6 booth algorithm.http://209.85.229.132/search?q=cache...l.cam.ac.uk/~a...
> In the appendix you get a C style algorithm. Whether you can copy this I
> don't know. There is a reference to getting it from ARM at the bottom.
> If this reduces the number of adders I will have a go at it. Not right
> now sorry. Andy

Maybe it's just me, but why would someone write in pseudo code and not
make the details very clear? Here is the line I am ranting about.

while (not (rs = 0) or borrow) and n < wl do ...

I suppose if you *assume* this pseudo code is like C, you can *assume*
that the C rules for precedence apply. But I don't see enough
similarity to C to make that assumption(e.g. "rd := if accumulate then
rn else 0" is clearly *not* C). So I am left wondering just how to
evaluate the conditional expression. On the other hand, if this
expression is what I think it is, the addition of two more sets of
parentheses would make it unambiguous.

Heck, I gave up long ago trying to remember rules of precedence for
the many languages I write in and *always* make my code unambiguous by
adding as many parentheses as needed. Sometimes this can look
unwieldy and verbose, but my intent is perfectly clear and no one will
be looking at the code asking themselves if I meant what I wrote or if
I made a mistake and meant something else.

Maybe this hits a nerve with me because of being bitten by poorly
written pseudo code in a research paper I had to read. The paper was
about a tree search algorithm and if you couldn't understand the
pseudo code *exactly*, then there was no point to the paper. Their
pseudo code wasn't even this good and we had a devil of a time
figuring it out. :^( On the other hand the other student I was
working with on this was a very attractive coed and I enjoyed every
minutes of working on it. :^)

rick

FPGA Central

World's 1st FPGA Portal