FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-03-2006, 06:26 PM
Robin Bruce
Guest
 
Posts: n/a
Default Inferring multiple-DSP48 pipelined multiplier in VHDL

Hi Guys,

I'm having trouble with the following problem:

I'm trying to create a 35x35 signed multiplier from DSP48s, inferring
pipelining in VHDL by adding registers after the multilplication
operation as seen below in the VHDL I'm using.

The problem is that when I synthesise, though I can see that the
synthesiser has noticed that it can shift registers about:

Synthesizing (advanced) Unit <signed_mult_TOP>.
Found pipelined multiplier on signal <mult_inst/_n0000>:
- 2 pipeline level(s) found in a register connected to the multiplier
macro output.
Pushing register(s) into the multiplier macro.

- 2 pipeline level(s) found in a register on signal <mult_inst/A2>.
Pushing register(s) into the multiplier macro.

- 2 pipeline level(s) found in a register on signal <mult_inst/B2>.
Pushing register(s) into the multiplier macro.

the clock rate achieved is still only a meagre 81.171MHz. I'll save my
half-baked hypotheses for now and see if anyone knows what's up here.
Any help you can give would be very much appreciated.

Robin

VHDL:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_arith.ALL;

ENTITY signed_mult_35x35 IS
generic (PIPE: natural);
port (
clk: IN std_logic;
a: IN signed(34 downto 0);
b: IN signed(34 downto 0);
o: OUT signed(69 downto 0));
END signed_mult_35x35;

ARCHITECTURE signed_mult_35x35_a OF signed_mult_35x35 IS

signal A2 : signed(34 downto 0);

signal B2 : signed(34 downto 0);

subtype mult_result is signed(69 downto 0);
type mult_result_array is array (0 to PIPE - 2) of mult_result;



signal pipeline_array : mult_result_array;

BEGIN

o <= pipeline_array(PIPE - 2);

reg: process(CLK) begin
if(rising_edge(CLK)) then
A2 <= a;
B2 <= b;
pipeline_array(0) <= A2 * B2;
for i in 1 to PIPE - 2 loop
pipeline_array(i) <= pipeline_array(i-1);
end loop;
-- Registering should be fused into DSP48-inferred multiply operation
end if;
end process;

END signed_mult_35x35_a;

Reply With Quote
  #2 (permalink)  
Old 07-04-2006, 05:55 AM
MM
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Robin,

IMHO, trying to get inferring of anything more complex than a flip-flop, or
perhaps an adder, to work is a waste of time. Just instantiate what you
need...

/Mikhail



Reply With Quote
  #3 (permalink)  
Old 07-04-2006, 09:18 AM
Ben Jones
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Hi Robin,

"Robin Bruce" <[email protected]> wrote in message
news:[email protected] ps.com...
> Hi Guys,
>
> I'm having trouble with the following problem:
>
> I'm trying to create a 35x35 signed multiplier from DSP48s, inferring
> pipelining in VHDL by adding registers after the multilplication
> operation as seen below in the VHDL I'm using.
>
> The problem is that when I synthesise, though I can see that the
> synthesiser has noticed that it can shift registers about:


> the clock rate achieved is still only a meagre 81.171MHz. I'll save my
> half-baked hypotheses for now and see if anyone knows what's up here.
> Any help you can give would be very much appreciated.


I've had a very similar problem recently, albeit in a slightly different
context. Can you let us know what version of the tools you are using?

Basically, it seems that XST is not always very good at using the *right*
registers when is pulls delay elements into a DSP block, nor at pulling said
elements in the right direction when there is a choice. You can easily end
up with a bunch of useless input registers, but no middle (M) or product (P)
register. Check the DSP48 configuration in your resulting netlist (with e.g.
FPGA editor) and have a look at where it's actually putting these registers.
You may be able to use "KEEP" constraints as a workaround, although my
feeling is you definitely shouldn't have to.

Funny thing is, this used to work pretty well (in my limited experience). If
I get a chance I'll submit this to the XST team myself, but it would help if
you opened a case with tech support (customer complaints carry more weight
than internal engineering whinges)...

Cheers,

-Ben-


Reply With Quote
  #4 (permalink)  
Old 07-04-2006, 11:24 AM
Robin Bruce
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Thanks Ben,

it's always good to know that I'm not imagining the problem. I'm using
ISE 8.1, service pack 3. I should probably point out at this point that
the purpose of this little project is as much about design methodology
as it is about having a functioning design. I'm aware that there's
about a million ways I could do this, but in order to have a portable
core that can be easily floorplanned, I want to have all my design
files as standard VHDL with no specific instantiations of FPGA
resources, nor any NGC files from CoreGen.

I've got a 20-odd page report on my attempts to do this, so perhaps
there's someone I should send this to? I've never opened a case with
tech support, so I'll look into how I might do this too...

Cheers,

Robin

I've had a very similar problem recently, albeit in a slightly
different
> context. Can you let us know what version of the tools you are using?
>
> Basically, it seems that XST is not always very good at using the *right*
> registers when is pulls delay elements into a DSP block, nor at pulling said
> elements in the right direction when there is a choice. You can easily end
> up with a bunch of useless input registers, but no middle (M) or product (P)
> register. Check the DSP48 configuration in your resulting netlist (with e.g.
> FPGA editor) and have a look at where it's actually putting these registers.
> You may be able to use "KEEP" constraints as a workaround, although my
> feeling is you definitely shouldn't have to.
>
> Funny thing is, this used to work pretty well (in my limited experience). If
> I get a chance I'll submit this to the XST team myself, but it would help if
> you opened a case with tech support (customer complaints carry more weight
> than internal engineering whinges)...
>
> Cheers,
>
> -Ben-


Reply With Quote
  #5 (permalink)  
Old 07-04-2006, 01:40 PM
Martin Thompson
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

"Robin Bruce" <[email protected]> writes:

> Hi Guys,
>
> I'm having trouble with the following problem:
>
> I'm trying to create a 35x35 signed multiplier from DSP48s, inferring
> pipelining in VHDL by adding registers after the multilplication
> operation as seen below in the VHDL I'm using.
>
> The problem is that when I synthesise, though I can see that the
> synthesiser has noticed that it can shift registers about:
>
> Synthesizing (advanced) Unit <signed_mult_TOP>.
> Found pipelined multiplier on signal <mult_inst/_n0000>:
> - 2 pipeline level(s) found in a register connected to the multiplier
> macro output.
> Pushing register(s) into the multiplier macro.
>
> - 2 pipeline level(s) found in a register on signal <mult_inst/A2>.
> Pushing register(s) into the multiplier macro.
>
> - 2 pipeline level(s) found in a register on signal <mult_inst/B2>.
> Pushing register(s) into the multiplier macro.
>
> the clock rate achieved is still only a meagre 81.171MHz. I'll save my
> half-baked hypotheses for now and see if anyone knows what's up here.
> Any help you can give would be very much appreciated.
>


Only more (potentialy dim) questions I'm afraid:

Have you had a look in FPGA editor to see what's going on?

Is it actually this bit of code that limits the timing?

Cheers,
Martin

--
[email protected]
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.trw.com/conekt

Reply With Quote
  #6 (permalink)  
Old 07-04-2006, 02:59 PM
Robin Bruce
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Martin,

> Have you had a look in FPGA editor to see what's going on?


This is where I myself look dim: I did open up the NCD file in the FPGA
Editor. I didn't really know what to do to tell if the right
registering was occurring. All I could see was that all 4 DSP48s were
instantiated together in a little row. I've never used FPGA editor
before. I'm more familiar with PlanAhead for looking at that sort of
thing, but I don't have that on my laptop, my current working platform.

> Is it actually this bit of code that limits the timing?


Well, all I can say is that I don't think so. It could very well be
though, but I've tried writing the VHDL in very different ways, guided
by things I've found in one or two guides to instantiating the DSP48s
in VHDL. Every way I write the VHDL, the same performance is obtained.
The thing is that I can see that the synthesis tool is making some kind
of effort to pipeline the thing.

This is the critical path that comes out of the synthesis report if
this means anything to anyone:

Data Path: mult_inst/Mmult__n00001 to mult_inst/Mmult__n0000_35
Gate Net
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
DSP48:CLK->PCOUT47 1 4.399 0.000 mult_inst/Mmult__n00001
(mult_inst/Mmult__n00002_PCIN_to_mult_inst/Mmult__n00001_PCOUT_47)
DSP48:PCIN47->PCOUT47 1 2.363 0.000
mult_inst/Mmult__n00002
(mult_inst/Mmult__n00003_PCIN_to_mult_inst/Mmult__n00002_PCOUT_47)
DSP48:PCIN47->PCOUT47 1 2.363 0.000
mult_inst/Mmult__n00003
(mult_inst/Mmult__n00004_PCIN_to_mult_inst/Mmult__n00003_PCOUT_47)
DSP48:PCIN47->P35 1 2.270 0.534 mult_inst/Mmult__n00004
(mult_inst/Mmult__n0000_s_69)
FD 0.391 mult_inst/Mmult__n0000_0
----------------------------------------
Total 12.320ns (11.786ns logic, 0.534ns route)
(95.7% logic, 4.3% route)

Cheers,

Robin

Reply With Quote
  #7 (permalink)  
Old 07-05-2006, 09:02 AM
Martin Thompson
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

"Robin Bruce" <[email protected]> writes:

> Martin,
>
> > Have you had a look in FPGA editor to see what's going on?

>
> This is where I myself look dim: I did open up the NCD file in the FPGA
> Editor. I didn't really know what to do to tell if the right
> registering was occurring. All I could see was that all 4 DSP48s were
> instantiated together in a little row. I've never used FPGA editor
> before. I'm more familiar with PlanAhead for looking at that sort of
> thing, but I don't have that on my laptop, my current working platform.
>


I haven't looked at a V-4 in FPGA editor... but if you go to one of
your DSP48 blocks and double click it, can you see the intrnals of it
and are there some boxes that are filled in for the use of registers?

> This is the critical path that comes out of the synthesis report if
> this means anything to anyone:
>
> Data Path: mult_inst/Mmult__n00001 to mult_inst/Mmult__n0000_35
> Gate Net
> Cell:in->out fanout Delay Delay Logical Name (Net Name)
> ---------------------------------------- ------------
> DSP48:CLK->PCOUT47 1 4.399 0.000 mult_inst/Mmult__n00001
> (mult_inst/Mmult__n00002_PCIN_to_mult_inst/Mmult__n00001_PCOUT_47)
> DSP48:PCIN47->PCOUT47 1 2.363 0.000
> mult_inst/Mmult__n00002
> (mult_inst/Mmult__n00003_PCIN_to_mult_inst/Mmult__n00002_PCOUT_47)
> DSP48:PCIN47->PCOUT47 1 2.363 0.000
> mult_inst/Mmult__n00003
> (mult_inst/Mmult__n00004_PCIN_to_mult_inst/Mmult__n00003_PCOUT_47)
> DSP48:PCIN47->P35 1 2.270 0.534 mult_inst/Mmult__n00004
> (mult_inst/Mmult__n0000_s_69)
> FD 0.391 mult_inst/Mmult__n0000_0
> ----------------------------------------
> Total 12.320ns (11.786ns logic, 0.534ns route)
> (95.7% logic, 4.3% route)
>


That looks like a cascade-chain... because your inputs are 35 bits
wide and you use more than one multiplier, they need to cascade. This
can be pipelined (by the look of the DSP48 diagram in UG073), but how
you'd infer that I have no idea :-( You may have to infer the
individual multipliers and the regs between them. But at that point,
you might as well instantiate them!

Cheers,
Martin

--
[email protected]
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.trw.com/conekt

Reply With Quote
  #8 (permalink)  
Old 07-05-2006, 05:31 PM
Robin Bruce
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Here's a little table I've knocked up after looking at those DSP48s in
the FPGA editor:

DSP48 Name Mmult__n00001 Mmult__n00002 Mmult__n00003 Mmult__n00004
AREG 2 2
2 2
BREG 2 0
2 0
CREG 0 0
0 0
MREG 0 0
0 0
PREG 0 0
0 0
LEGACY_MODE MULT18X18 MULT18X18 MULT18X18 MULT18X18
CARRYINSELREG 0 0 0
0
OPMODEREG 0 0 0
0
SUBTRACTREG 0 0 0
0
CARRYINREG 0 0 0
0
B_INPUT DIRECT CASCADE DIRECT
CASCADE

Cheers,

Robin


Martin Thompson wrote:
> "Robin Bruce" <[email protected]> writes:
>
> > Martin,
> >
> > > Have you had a look in FPGA editor to see what's going on?

> >
> > This is where I myself look dim: I did open up the NCD file in the FPGA
> > Editor. I didn't really know what to do to tell if the right
> > registering was occurring. All I could see was that all 4 DSP48s were
> > instantiated together in a little row. I've never used FPGA editor
> > before. I'm more familiar with PlanAhead for looking at that sort of
> > thing, but I don't have that on my laptop, my current working platform.
> >

>
> I haven't looked at a V-4 in FPGA editor... but if you go to one of
> your DSP48 blocks and double click it, can you see the intrnals of it
> and are there some boxes that are filled in for the use of registers?
>
> > This is the critical path that comes out of the synthesis report if
> > this means anything to anyone:
> >
> > Data Path: mult_inst/Mmult__n00001 to mult_inst/Mmult__n0000_35
> > Gate Net
> > Cell:in->out fanout Delay Delay Logical Name (Net Name)
> > ---------------------------------------- ------------
> > DSP48:CLK->PCOUT47 1 4.399 0.000 mult_inst/Mmult__n00001
> > (mult_inst/Mmult__n00002_PCIN_to_mult_inst/Mmult__n00001_PCOUT_47)
> > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > mult_inst/Mmult__n00002
> > (mult_inst/Mmult__n00003_PCIN_to_mult_inst/Mmult__n00002_PCOUT_47)
> > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > mult_inst/Mmult__n00003
> > (mult_inst/Mmult__n00004_PCIN_to_mult_inst/Mmult__n00003_PCOUT_47)
> > DSP48:PCIN47->P35 1 2.270 0.534 mult_inst/Mmult__n00004
> > (mult_inst/Mmult__n0000_s_69)
> > FD 0.391 mult_inst/Mmult__n0000_0
> > ----------------------------------------
> > Total 12.320ns (11.786ns logic, 0.534ns route)
> > (95.7% logic, 4.3% route)
> >

>
> That looks like a cascade-chain... because your inputs are 35 bits
> wide and you use more than one multiplier, they need to cascade. This
> can be pipelined (by the look of the DSP48 diagram in UG073), but how
> you'd infer that I have no idea :-( You may have to infer the
> individual multipliers and the regs between them. But at that point,
> you might as well instantiate them!
>
> Cheers,
> Martin
>
> --
> [email protected]
> TRW Conekt - Consultancy in Engineering, Knowledge and Technology
> http://www.trw.com/conekt


Reply With Quote
  #9 (permalink)  
Old 07-05-2006, 05:46 PM
Robin Bruce
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Hmmm.... that looks fairly mangled. Try again...

Incidentally, these are for an unsigned version of the problem I've
been describing, for which the problem is the same.

DSP48 Name DSP1 DSP2 DSP3 DSP4
AREG 2 2 2 2
BREG 2 0 2 0
CREG 0 0 0 0
MREG 0 0 0 0
PREG 0 0 0 0
LEGACY_MODE MULT18X18
CARRYINSELREG 0 0 0 0
OPMODEREG 0 0 0 0
SUBTRACTREG 0 0 0 0
CARRYINREG 0 0 0 0
B_INPUT DIRECT CASCADE DIRECT CASCADE

Robin

>
>
> Martin Thompson wrote:
> > "Robin Bruce" <[email protected]> writes:
> >
> > > Martin,
> > >
> > > > Have you had a look in FPGA editor to see what's going on?
> > >
> > > This is where I myself look dim: I did open up the NCD file in the FPGA
> > > Editor. I didn't really know what to do to tell if the right
> > > registering was occurring. All I could see was that all 4 DSP48s were
> > > instantiated together in a little row. I've never used FPGA editor
> > > before. I'm more familiar with PlanAhead for looking at that sort of
> > > thing, but I don't have that on my laptop, my current working platform.
> > >

> >
> > I haven't looked at a V-4 in FPGA editor... but if you go to one of
> > your DSP48 blocks and double click it, can you see the intrnals of it
> > and are there some boxes that are filled in for the use of registers?
> >
> > > This is the critical path that comes out of the synthesis report if
> > > this means anything to anyone:
> > >
> > > Data Path: mult_inst/Mmult__n00001 to mult_inst/Mmult__n0000_35
> > > Gate Net
> > > Cell:in->out fanout Delay Delay Logical Name (Net Name)
> > > ---------------------------------------- ------------
> > > DSP48:CLK->PCOUT47 1 4.399 0.000 mult_inst/Mmult__n00001
> > > (mult_inst/Mmult__n00002_PCIN_to_mult_inst/Mmult__n00001_PCOUT_47)
> > > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > > mult_inst/Mmult__n00002
> > > (mult_inst/Mmult__n00003_PCIN_to_mult_inst/Mmult__n00002_PCOUT_47)
> > > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > > mult_inst/Mmult__n00003
> > > (mult_inst/Mmult__n00004_PCIN_to_mult_inst/Mmult__n00003_PCOUT_47)
> > > DSP48:PCIN47->P35 1 2.270 0.534 mult_inst/Mmult__n00004
> > > (mult_inst/Mmult__n0000_s_69)
> > > FD 0.391 mult_inst/Mmult__n0000_0
> > > ----------------------------------------
> > > Total 12.320ns (11.786ns logic, 0.534ns route)
> > > (95.7% logic, 4.3% route)
> > >

> >
> > That looks like a cascade-chain... because your inputs are 35 bits
> > wide and you use more than one multiplier, they need to cascade. This
> > can be pipelined (by the look of the DSP48 diagram in UG073), but how
> > you'd infer that I have no idea :-( You may have to infer the
> > individual multipliers and the regs between them. But at that point,
> > you might as well instantiate them!
> >
> > Cheers,
> > Martin
> >
> > --
> > [email protected]
> > TRW Conekt - Consultancy in Engineering, Knowledge and Technology
> > http://www.trw.com/conekt


Reply With Quote
  #10 (permalink)  
Old 07-05-2006, 08:29 PM
Ray Andraka
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Robin Bruce wrote:

> Thanks Ben,
>
> it's always good to know that I'm not imagining the problem. I'm using
> ISE 8.1, service pack 3. I should probably point out at this point that
> the purpose of this little project is as much about design methodology
> as it is about having a functioning design. I'm aware that there's
> about a million ways I could do this, but in order to have a portable
> core that can be easily floorplanned, I want to have all my design
> files as standard VHDL with no specific instantiations of FPGA
> resources, nor any NGC files from CoreGen.
>
>
>


You'll generally have far better results instantiating the DSP48's with
everything set up the way you want it. I realize you want RTL-only for
portability. How about instead putting a wrapper around the DSP48
instantiation so that it appears as a generic pipelined multiplier. If
you port to another device, just replace the wrapper with one
appropriate for that device. If you want, put your wrapper file(s) in a
separate subdirectory so that you can quickly identify which ones need
to be replaced if you make a technology change. Making up new wrappers
for a new device should not require much effort or time.
Reply With Quote
  #11 (permalink)  
Old 07-05-2006, 08:33 PM
Ray Andraka
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Robin Bruce wrote:

> Martin,
>
>
>>Have you had a look in FPGA editor to see what's going on?

>
>
> This is where I myself look dim: I did open up the NCD file in the FPGA
> Editor. I didn't really know what to do to tell if the right
> registering was occurring. All I could see was that all 4 DSP48s were
> instantiated together in a little row. I've never used FPGA editor
> before. I'm more familiar with PlanAhead for looking at that sort of
> thing, but I don't have that on my laptop, my current working platform.
>
>
>>Is it actually this bit of code that limits the timing?

>
>
> Well, all I can say is that I don't think so. It could very well be
> though, but I've tried writing the VHDL in very different ways, guided
> by things I've found in one or two guides to instantiating the DSP48s
> in VHDL. Every way I write the VHDL, the same performance is obtained.
> The thing is that I can see that the synthesis tool is making some kind
> of effort to pipeline the thing.
>
> This is the critical path that comes out of the synthesis report if
> this means anything to anyone:
>
> Data Path: mult_inst/Mmult__n00001 to mult_inst/Mmult__n0000_35
> Gate Net
> Cell:in->out fanout Delay Delay Logical Name (Net Name)
> ---------------------------------------- ------------
> DSP48:CLK->PCOUT47 1 4.399 0.000 mult_inst/Mmult__n00001
> (mult_inst/Mmult__n00002_PCIN_to_mult_inst/Mmult__n00001_PCOUT_47)
> DSP48:PCIN47->PCOUT47 1 2.363 0.000
> mult_inst/Mmult__n00002
> (mult_inst/Mmult__n00003_PCIN_to_mult_inst/Mmult__n00002_PCOUT_47)
> DSP48:PCIN47->PCOUT47 1 2.363 0.000
> mult_inst/Mmult__n00003
> (mult_inst/Mmult__n00004_PCIN_to_mult_inst/Mmult__n00003_PCOUT_47)
> DSP48:PCIN47->P35 1 2.270 0.534 mult_inst/Mmult__n00004
> (mult_inst/Mmult__n0000_s_69)
> FD 0.391 mult_inst/Mmult__n0000_0
> ----------------------------------------
> Total 12.320ns (11.786ns logic, 0.534ns route)
> (95.7% logic, 4.3% route)
>
> Cheers,
>
> Robin
>


Your design is not inferring the P register, so the adders are
combinatorial. The adders get connected in a daisy chain. You may have
to recode your RTL to reflect that, as the synthesizer is not really
smart enough to push around the registers to the degree necessary to
deal with differing latencies among the adder inputs.
Reply With Quote
  #12 (permalink)  
Old 07-06-2006, 10:53 AM
Robin Bruce
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

OK,

so if I've understood this properly, instead of registering the inputs
by differing amounts in order to account for the PREG cascade at the
outputs, everything is getting (pointlessly) registered by the same
amount at the input, and then summed together combinatorily at the
output of DSP48s, followed by some more pointless registering occurs at
the output?

My interest in this is fairly academic really... The design from which
this question arose has a 35x35 multiplier generated using CoreGen, so
it works fine. It would be useful from a design methodology perspective
to have the ability to infer the DSP48s in such a simple manner. It
would make autogenerating VHDL for different pipelined multiplier
structures a walk in the park. I'm not necessarily of the opinion that
this should be possible today, just that it would be really nice and
when I came across sources that suggested it was possible to do this in
a high performance manner I decided to try it. (XtremeDSP for Virtex-4
FPGAsUser Guide, www.xilinx.com/bvdocs/userguides/ug073.pdf
& Philippe Garrault, Accelerate design performance with HDL coding
practices)

I've been really impressed with both the informal (via Ben Jones) and
formal (via tech support) reaction to me bringing this up with Xilinx,
so even if this has all come from me being a bit naive about the
capabilities of XST, there seem to be people in Xilinx who believe that
we should be able to be so naive and expect good performance at the
same time.

Still very much a beginner,

Robin

>Ray Andraka wrote:
> Robin Bruce wrote:
>
> > Martin,
> >
> >
> >>Have you had a look in FPGA editor to see what's going on?

> >
> >
> > This is where I myself look dim: I did open up the NCD file in the FPGA
> > Editor. I didn't really know what to do to tell if the right
> > registering was occurring. All I could see was that all 4 DSP48s were
> > instantiated together in a little row. I've never used FPGA editor
> > before. I'm more familiar with PlanAhead for looking at that sort of
> > thing, but I don't have that on my laptop, my current working platform.
> >
> >
> >>Is it actually this bit of code that limits the timing?

> >
> >
> > Well, all I can say is that I don't think so. It could very well be
> > though, but I've tried writing the VHDL in very different ways, guided
> > by things I've found in one or two guides to instantiating the DSP48s
> > in VHDL. Every way I write the VHDL, the same performance is obtained.
> > The thing is that I can see that the synthesis tool is making some kind
> > of effort to pipeline the thing.
> >
> > This is the critical path that comes out of the synthesis report if
> > this means anything to anyone:
> >
> > Data Path: mult_inst/Mmult__n00001 to mult_inst/Mmult__n0000_35
> > Gate Net
> > Cell:in->out fanout Delay Delay Logical Name (Net Name)
> > ---------------------------------------- ------------
> > DSP48:CLK->PCOUT47 1 4.399 0.000 mult_inst/Mmult__n00001
> > (mult_inst/Mmult__n00002_PCIN_to_mult_inst/Mmult__n00001_PCOUT_47)
> > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > mult_inst/Mmult__n00002
> > (mult_inst/Mmult__n00003_PCIN_to_mult_inst/Mmult__n00002_PCOUT_47)
> > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > mult_inst/Mmult__n00003
> > (mult_inst/Mmult__n00004_PCIN_to_mult_inst/Mmult__n00003_PCOUT_47)
> > DSP48:PCIN47->P35 1 2.270 0.534 mult_inst/Mmult__n00004
> > (mult_inst/Mmult__n0000_s_69)
> > FD 0.391 mult_inst/Mmult__n0000_0
> > ----------------------------------------
> > Total 12.320ns (11.786ns logic, 0.534ns route)
> > (95.7% logic, 4.3% route)
> >
> > Cheers,
> >
> > Robin
> >

>
> Your design is not inferring the P register, so the adders are
> combinatorial. The adders get connected in a daisy chain. You may have
> to recode your RTL to reflect that, as the synthesizer is not really
> smart enough to push around the registers to the degree necessary to
> deal with differing latencies among the adder inputs.


Reply With Quote
  #13 (permalink)  
Old 07-06-2006, 04:23 PM
Robin Bruce
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

Guys,

Given feedback I've had from Xilinx, It seems that this is something
that should be possible today but isn't. Apparently a bugfix has been
made that should fix this and it will be released with ISE 9.1

Robin

Robin Bruce wrote:
> OK,
>
> so if I've understood this properly, instead of registering the inputs
> by differing amounts in order to account for the PREG cascade at the
> outputs, everything is getting (pointlessly) registered by the same
> amount at the input, and then summed together combinatorily at the
> output of DSP48s, followed by some more pointless registering occurs at
> the output?
>
> My interest in this is fairly academic really... The design from which
> this question arose has a 35x35 multiplier generated using CoreGen, so
> it works fine. It would be useful from a design methodology perspective
> to have the ability to infer the DSP48s in such a simple manner. It
> would make autogenerating VHDL for different pipelined multiplier
> structures a walk in the park. I'm not necessarily of the opinion that
> this should be possible today, just that it would be really nice and
> when I came across sources that suggested it was possible to do this in
> a high performance manner I decided to try it. (XtremeDSP for Virtex-4
> FPGAsUser Guide, www.xilinx.com/bvdocs/userguides/ug073.pdf
> & Philippe Garrault, Accelerate design performance with HDL coding
> practices)
>
> I've been really impressed with both the informal (via Ben Jones) and
> formal (via tech support) reaction to me bringing this up with Xilinx,
> so even if this has all come from me being a bit naive about the
> capabilities of XST, there seem to be people in Xilinx who believe that
> we should be able to be so naive and expect good performance at the
> same time.
>
> Still very much a beginner,
>
> Robin
>
> >Ray Andraka wrote:
> > Robin Bruce wrote:
> >
> > > Martin,
> > >
> > >
> > >>Have you had a look in FPGA editor to see what's going on?
> > >
> > >
> > > This is where I myself look dim: I did open up the NCD file in the FPGA
> > > Editor. I didn't really know what to do to tell if the right
> > > registering was occurring. All I could see was that all 4 DSP48s were
> > > instantiated together in a little row. I've never used FPGA editor
> > > before. I'm more familiar with PlanAhead for looking at that sort of
> > > thing, but I don't have that on my laptop, my current working platform.
> > >
> > >
> > >>Is it actually this bit of code that limits the timing?
> > >
> > >
> > > Well, all I can say is that I don't think so. It could very well be
> > > though, but I've tried writing the VHDL in very different ways, guided
> > > by things I've found in one or two guides to instantiating the DSP48s
> > > in VHDL. Every way I write the VHDL, the same performance is obtained.
> > > The thing is that I can see that the synthesis tool is making some kind
> > > of effort to pipeline the thing.
> > >
> > > This is the critical path that comes out of the synthesis report if
> > > this means anything to anyone:
> > >
> > > Data Path: mult_inst/Mmult__n00001 to mult_inst/Mmult__n0000_35
> > > Gate Net
> > > Cell:in->out fanout Delay Delay Logical Name (Net Name)
> > > ---------------------------------------- ------------
> > > DSP48:CLK->PCOUT47 1 4.399 0.000 mult_inst/Mmult__n00001
> > > (mult_inst/Mmult__n00002_PCIN_to_mult_inst/Mmult__n00001_PCOUT_47)
> > > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > > mult_inst/Mmult__n00002
> > > (mult_inst/Mmult__n00003_PCIN_to_mult_inst/Mmult__n00002_PCOUT_47)
> > > DSP48:PCIN47->PCOUT47 1 2.363 0.000
> > > mult_inst/Mmult__n00003
> > > (mult_inst/Mmult__n00004_PCIN_to_mult_inst/Mmult__n00003_PCOUT_47)
> > > DSP48:PCIN47->P35 1 2.270 0.534 mult_inst/Mmult__n00004
> > > (mult_inst/Mmult__n0000_s_69)
> > > FD 0.391 mult_inst/Mmult__n0000_0
> > > ----------------------------------------
> > > Total 12.320ns (11.786ns logic, 0.534ns route)
> > > (95.7% logic, 4.3% route)
> > >
> > > Cheers,
> > >
> > > Robin
> > >

> >
> > Your design is not inferring the P register, so the adders are
> > combinatorial. The adders get connected in a daisy chain. You may have
> > to recode your RTL to reflect that, as the synthesizer is not really
> > smart enough to push around the registers to the degree necessary to
> > deal with differing latencies among the adder inputs.


Reply With Quote
  #14 (permalink)  
Old 07-06-2006, 04:49 PM
Andy
Guest
 
Posts: n/a
Default Re: Inferring multiple-DSP48 pipelined multiplier in VHDL

It may take a little effort and time to infer complex things using RTL,
but the simulation performance has always been well worth the effort
for me. Simulating the instantiated primitives is _very_ slow compared
to RTL.

Andy


MM wrote:
> Robin,
>
> IMHO, trying to get inferring of anything more complex than a flip-flop, or
> perhaps an adder, to work is a waste of time. Just instantiate what you
> need...
>
> /Mikhail


Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Inferring a pipelined multiplexer Uwe Bonnes Verilog 3 03-08-2007 10:03 PM
Code for Verilog 8bit * 8bit pipelined multiplier [email protected] Verilog 4 11-11-2006 08:29 PM
VHDL code For Floating point adder and Multiplier mailmekaran FPGA 3 06-05-2006 01:15 AM
To use adder and multiplier of DSP48 in V4 vssumesh FPGA 3 04-13-2006 07:26 AM
Re: CAN WE HAVE SIGNALS WITH MULTIPLE SOURCES IN VHDL? sudha FPGA 0 01-03-2005 08:38 AM


All times are GMT +1. The time now is 03:21 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved