FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-18-2006, 11:25 PM
acd
Guest
 
Posts: n/a
Default V5 and carry lookahead

I was axcited when I read "carry lookahead" with respect to V5.
But when looking at the diagrams in the user guide it looks to me like
ripple carry.
I do not want to be picky, but carry lookahead means to me
(poly)logarithmic growth of delay with respect to adder length.
The timing model (as far as I understood it) suggests
that the delay grows linearly with the adder length.
Now what is it, ripple carry or lookahead.

It is clear that FPGAs with linear layout of adders ultimately
approximate linear delay/adder length but if wire delay is
already the dominant problem, then a more compact
arrangement like along a Sierpinsky curve could be used.

There is paper from Hosler, Hauck and Fry from 97
which discusses several adder designs with respect to FPGAs,
but in 65nm wire delay, even with optimal buffering would have to be
considered.

Andreas

Reply With Quote
  #2 (permalink)  
Old 05-19-2006, 12:56 AM
Peter Alfke
Guest
 
Posts: n/a
Default Re: V5 and carry lookahead

It is carry-look-ahead over 4 bits, ripple-carry between these 4-bit
slices.
The "effective ripple" delay is 21 ps per bit, and that's what counts.
And it includes the wire-delay.
Yes, the carry delay grows linearily with the bit-length, but it is a
very short delay per bit.
Peter Alfke, Xilinx Applications

Reply With Quote
  #3 (permalink)  
Old 05-19-2006, 12:56 AM
Peter Alfke
Guest
 
Posts: n/a
Default Re: V5 and carry lookahead

It is carry-look-ahead over 4 bits, ripple-carry between these 4-bit
slices.
The "effective ripple" delay is 21 ps per bit, and that's what counts.
And it includes the wire-delay.
Yes, the carry delay grows linearily with the bit-length, but it is a
very short delay per bit.
Peter Alfke, Xilinx Applications

Reply With Quote
  #4 (permalink)  
Old 05-19-2006, 08:24 AM
Guest
 
Posts: n/a
Default Re: V5 and carry lookahead


Peter Alfke wrote:
> It is carry-look-ahead over 4 bits, ripple-carry between these 4-bit
> slices.
> The "effective ripple" delay is 21 ps per bit, and that's what counts.
> And it includes the wire-delay.
> Yes, the carry delay grows linearily with the bit-length, but it is a
> very short delay per bit.
> Peter Alfke, Xilinx Applications


Hi Peter,

Any of the Xilinx guys do a performance study for wider adders with
this new carry architecture in relation to carry select or Brent-Kung
FPGA implementations, or maybe able to offer revised versions of those
for the V5 given new tradeoffs with the 6-LUT and carry changes?

Have fun!
John

Reply With Quote
  #5 (permalink)  
Old 05-19-2006, 10:59 AM
Ben Jones
Guest
 
Posts: n/a
Default Re: V5 and carry lookahead


<[email protected]> wrote in message
news:[email protected] oups.com...
>
> Any of the Xilinx guys do a performance study for wider adders with
> this new carry architecture in relation to carry select or Brent-Kung
> FPGA implementations, or maybe able to offer revised versions of those
> for the V5 given new tradeoffs with the 6-LUT and carry changes?


Yes indeed. Obviously the new LUT6 architecture changes the playing field
somewhat when it comes to arithmetic. There has been plenty of work done on
identifying the optimal mappings for basic arithmetic functions so the tools
can do a Good Job. (Nominally. )

The improvements in the carry chain speed are substantial. Although there's
still a noticable hit when getting on and off them, the raw propagation
speed is a real step up from previous generations. The fabric speed is
really catching up to the embedded IP blocks now...

Cheers,

-Ben-


Reply With Quote
  #6 (permalink)  
Old 05-19-2006, 01:37 PM
Guest
 
Posts: n/a
Default Re: V5 and carry lookahead


Ben Jones wrote:
> Yes indeed. Obviously the new LUT6 architecture changes the playing field
> somewhat when it comes to arithmetic. There has been plenty of work done on
> identifying the optimal mappings for basic arithmetic functions so the tools
> can do a Good Job. (Nominally. )


We've already been looking at technology specific mapping for FpgaC,
and one of the things noticed was that LUT4s didn't pack well with
arithmetics, and were already looking at F5/F6 to improve that problem.
Building to LUT6s is certainly a better fit for the netlists we
generate, so my response is YIPPIE

Also the 64x1 LUT RAMs are also a blessing, as it makes it far easier
to support many applications with short arrays that size ... where the
16 and 32 deep arrays are frequently not enough. Is there an expander
function in the slice fabric to cascade these, like the 32x1 in the V2
and V2Pros? Dual port fabric?

Any chance I can get some better docs and suggested arithmetic
implementations so we can target these devices with the new technology
mapper?

> The improvements in the carry chain speed are substantial. Although there's
> still a noticable hit when getting on and off them, the raw propagation
> speed is a real step up from previous generations. The fabric speed is
> really catching up to the embedded IP blocks now...


I'm interested in performance for 32bit and 64 bit arithmetics as Long
and Long Long variables, will it be the case that the carry logic is
slower than look ahead functions as with the current carry chains?

Reply With Quote
  #7 (permalink)  
Old 05-19-2006, 03:48 PM
Ben Jones
Guest
 
Posts: n/a
Default Re: V5 and carry lookahead

Hi John,

<[email protected]> wrote in message
news:[email protected] oups.com...
>
> We've already been looking at technology specific mapping for FpgaC,
> and one of the things noticed was that LUT4s didn't pack well with
> arithmetics, and were already looking at F5/F6 to improve that problem.
> Building to LUT6s is certainly a better fit for the netlists we
> generate, so my response is YIPPIE


I love the LUT6 architecture, particularly for muxes (4:1 in a single LUT,
16:1 in a single slice, with no wasted inputs).

> Also the 64x1 LUT RAMs are also a blessing, as it makes it far easier
> to support many applications with short arrays that size ... where the
> 16 and 32 deep arrays are frequently not enough. Is there an expander
> function in the slice fabric to cascade these, like the 32x1 in the V2
> and V2Pros? Dual port fabric?


I don't believe you get anything to cascade between slices, but a single
SLICEM will give you 256x1-bit by using all four LUTs. You can also get a
variety of dual-port configurations: up to 128x1 true dual-port, or up to
64x3-bit simple dual-port per slice. My personal favourite: 32x2 or 64x1
quad-port per slice (that's 1xRW and 3xRO ports).

> Any chance I can get some better docs and suggested arithmetic
> implementations so we can target these devices with the new technology
> mapper?


I don't know how much of that information gets published - not so much a
secrecy thing as an hours-in-the-day thing. Mostly it's seen as being of
internal interest only. (Your [external] interest has been duly noted. )

> I'm interested in performance for 32bit and 64 bit arithmetics as Long
> and Long Long variables, will it be the case that the carry logic is
> slower than look ahead functions as with the current carry chains?


I don't have exact details to hand, but the carry-chain delay (CIN->COUT) in
V5 is about the same as V4 - maybe slightly shorter - but for 4 CY stages
per slice, not just 2. i.e. carry-chain dominated logic could potentially go
around 2x faster. The difference between 32-bit add and 64-bit add is
therefore around 600ps... so for the majority of applications, the carry
chain takes some beating! However, straightforward ripple-carry arithmetic
can be a bit wasteful of LUT input resources.

It's also possible (with some degree of cunning) to create an efficient
3-input adder in the fabric, although there is some speed penalty to this.

Cheers,

-Ben-


Reply With Quote
  #8 (permalink)  
Old 05-20-2006, 12:35 AM
Guest
 
Posts: n/a
Default Re: V5 and carry lookahead


Ben Jones wrote:
> I love the LUT6 architecture, particularly for muxes (4:1 in a single LUT,
> 16:1 in a single slice, with no wasted inputs).


Yep ... I was already looking at that for the RC5 cracker demo code I
did last year, as it should have a much better fit and performance. Not
that many LX330 devices would have equiv performance to all of dnet,
assuming you can actually power the device and keep it cool fully
packed.

> I don't believe you get anything to cascade between slices, but a single
> SLICEM will give you 256x1-bit by using all four LUTs. You can also get a
> variety of dual-port configurations: up to 128x1 true dual-port, or up to
> 64x3-bit simple dual-port per slice. My personal favourite: 32x2 or 64x1
> quad-port per slice (that's 1xRW and 3xRO ports).


Yippie ... that is more than enough(for now) ... and the dual/quad port
configurations are exactly what I've found useful in FpgaC for typical
loops, one, two or three references with a writer. Being able to have
both the array storage and most of the arithmetics LUTS packed into the
same slice/clb really cuts down on routing requirements/delays.

> It's also possible (with some degree of cunning) to create an efficient
> 3-input adder in the fabric, although there is some speed penalty to this.


Hmm ... interesting ... space/time tradeoffs are another area we need
to spend more time looking at for FpgaC. So far that balance has been
static, and favors performance in most cases. Dense packing like that
could certainly be useful.

One of the interesting side effects of doing bit level optimization and
packing in FpgaC, is applications like the RC5 cracker end up packing
both arithmetics and the barrel shifter components into the same LUT
and avoids wasting inputs and logic levels (which offsets the
poor/general technology mapping to some extent) ... that just gets
better with LUT6s. The down side is that it get's harder to extract
from the truth table possible fits to specialized logic in the slice as
the truth table grows 2^n in size and the number of permutations to
search does as well.

Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
XST vhdl adder with carry out : broken carry chain Bart De Zwaef FPGA 13 09-22-2004 04:52 PM
Re: XST vhdl adder with carry out : broken carry chain glen herrmannsfeldt Verilog 1 09-22-2004 04:52 PM
Re: XST vhdl adder with carry out : broken carry chain Bret Wade FPGA 1 09-22-2004 08:55 AM
Arithmetics with carry Kevin Becker FPGA 12 11-12-2003 07:44 AM


All times are GMT +1. The time now is 12:04 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved