FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-12-2005, 10:08 PM
Joseph H Allen
Guest
 
Posts: n/a
Default V4 vs. Stratix-II...

I'm upgrading a design, and I'm in the early phases of choosing a vendor.
I'm trying to compare parts based on experience I've had in the past, so I'm
focusing on block RAM clock to out delay as a critical performance number:

Altera M4K vs. Xilinx Block RAM clock to out delay, non-registered outputs:

Stratix-II -3 2.46 ns
Stratix-II -4 2.828 ns
Stratix-II -5 3.393 ns

Xilinx-V4 -11 1.83 ns
Xilinx-V4 -10 2.10 ns

Xilinx-V2 -4 2.65 ns (current part)

V4 appears to be 1.62 times faster for the slowest speed grade parts (which
I'm probably most interested in, though I should really compare equal priced
parts), and slower even than the original V2 design. Am I missing
something? Several posts here suggest that Stratix-II interconnect is
faster- is there any datasheet evidence to back this up? Lets say the RAM
output is at least feeding a 2:1 MUX before being registered, and porbably
has to travel ~1/3 the width of the chip.

Also, help me fill in my chart:

LUT delay:

Xilinx-V2 -4 439ps
Xilinx-V4 -10 200ps
Xilinx-V4 -11 170ps
Stratix-II ? (can't find any data)

Carry delay:

Xilinx-V2 -4 106ps
Xilinx-V4 -10 90 ps
Xilinx-V4 -11 80 ps
Stratix-II ? (can't find any data)

Routing delay:

I can do this with fpga_editor in Xilinx. How to do it for Stratix-II ?

--
/* [email protected] (192.74.137.5) */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0%79-77?1:0<1659?79:0>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
Reply With Quote
  #2 (permalink)  
Old 05-12-2005, 11:19 PM
Ben Twijnstra
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Hi Joseph,

I stopped reading data sheets since they're way too big and the information
is never organized the way I need to have it. So I tend to simply write
little test cases and let the tools tell me what I need to know.

I would personally just compile the design with your new constraints in both
ISE and Quartus II (v5 has just been released) and see who comes out best.

> Altera M4K vs. Xilinx Block RAM clock to out delay, non-registered
> outputs:
>
> Stratix-II -3 2.46 ns
> Stratix-II -4 2.828 ns
> Stratix-II -5 3.393 ns
>
> Xilinx-V4 -11 1.83 ns
> Xilinx-V4 -10 2.10 ns
>
> Xilinx-V2 -4 2.65 ns (current part)


I suggest you re-check Stratix-II timing with Quartus II 5.0 - Altera has
been doing some re-characterization which seemingly hasn't made it to the
handbook yet. In an M4K I am using in a Stratix II I'm getting 1.85ns for a
-3 part and 2.4ns for a -5 part.

> LUT Delay:
> Stratix-II ? (can't find any data)


Well, it kind of varies between (off the cuff) 83ps and 400ps depending on
the input that changes and the mode the ALM is in.

Easy to check in Quartus with, for example, an 8-input AND or so. I'm
getting cell delays between 0.047 and 0.404ns depending on the mode and the
input of the ALM (see below on how to do this).

> Carry delay:
>
> Xilinx-V2 -4 106ps
> Xilinx-V4 -10 90 ps
> Xilinx-V4 -11 80 ps
> Stratix-II ? (can't find any data)
>
> Routing delay:
>
> I can do this with fpga_editor in Xilinx. How to do it for Stratix-II ?


Open the timing analyzer. Right-click a path and select "List Paths" from
the menu. When expanding the messgaes in the status window you should get
detailed info on both cell and routing delay of the path.

Best regards,


Ben


Reply With Quote
  #3 (permalink)  
Old 05-12-2005, 11:38 PM
Peter Sommerfeld
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Hi Joseph,

Remember that in Q II 5.0 the M4k performance has increased from 400 to
550 MHz. It looks like you're using the out-of-date numbers for tCO.
The new ones should be ~ 1.88 ns (I'm guessing).

There's a few ways to find the routing delays in Q II. The most
detailed way is to open the Timing Floorplanner (Assignments/Timing
Closure Floorplan), right-click a used logic cell, and choose
Locate>Chip Editor.

>From here you can multi-select resources, choose View/Show Delays,

right-click, and choose "Generate Connections Between Nodes". You can
show the actual routes used with View/Highlight Routing.

The easier way is to stay in the Timing Floorplanner, Ctrl-click the
stuff you want to find delays for, make sure View/Routing/"Show Routing
Delays" is selected, and choose View/Routing/"Show Paths Between
Nodes".

Interesting ... the Sratix II handbook doesn't have LUT timing params.
I was sure they were there for Stratix. Well it shouldn't be too
difficult with Chip Editor ... maybe someone gets an answer before I do
....

-- Pete



Joseph H Allen wrote:
> I'm upgrading a design, and I'm in the early phases of choosing a

vendor.
> I'm trying to compare parts based on experience I've had in the past,

so I'm
> focusing on block RAM clock to out delay as a critical performance

number:
>
> Altera M4K vs. Xilinx Block RAM clock to out delay, non-registered

outputs:
>
> Stratix-II -3 2.46 ns
> Stratix-II -4 2.828 ns
> Stratix-II -5 3.393 ns
>
> Xilinx-V4 -11 1.83 ns
> Xilinx-V4 -10 2.10 ns
>
> Xilinx-V2 -4 2.65 ns (current part)
>
> V4 appears to be 1.62 times faster for the slowest speed grade parts

(which
> I'm probably most interested in, though I should really compare equal

priced
> parts), and slower even than the original V2 design. Am I missing
> something? Several posts here suggest that Stratix-II interconnect

is
> faster- is there any datasheet evidence to back this up? Lets say

the RAM
> output is at least feeding a 2:1 MUX before being registered, and

porbably
> has to travel ~1/3 the width of the chip.
>
> Also, help me fill in my chart:
>
> LUT delay:
>
> Xilinx-V2 -4 439ps
> Xilinx-V4 -10 200ps
> Xilinx-V4 -11 170ps
> Stratix-II ? (can't find any data)
>
> Carry delay:
>
> Xilinx-V2 -4 106ps
> Xilinx-V4 -10 90 ps
> Xilinx-V4 -11 80 ps
> Stratix-II ? (can't find any data)
>
> Routing delay:
>
> I can do this with fpga_editor in Xilinx. How to do it for

Stratix-II ?
>
> --
> /* [email protected] (192.74.137.5) */ /* Joseph

H. Allen */
> int

a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--q=3&(r=time(0)
>

+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0%79-77?1:0<1659?79:0>158?-79:0,q?!a[p+q*2
> ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817printf(q%79?"%c":"%c\n","

#"[!a[q-1]]);}

Reply With Quote
  #4 (permalink)  
Old 05-13-2005, 12:43 AM
Austin Lesea
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Joseph,

I just saw a presentation that shows that V4 is faster on all
interconnet paths (by as much as 500 ps for long paths) except the
immediate neighbor paths, where we are just ever to slightly slower than
S2 neighbor paths.

I also saw LUT comparisons, which took 8 slides, with animations, as
comparing the 4LUTs to the ALM-LUT is not trivial: you have to look at
each and every input to output delay. And then you have to make a guess
as to how your logic will get synthesized. Yes, we are faster for 4 LUT
(most inputs), and they are faaster for wider functions (but not all
inputs).

For example: S2 4LUT input delays to output (in order): 155ps, 382ps,
360ps, 275ps. V4 4LUT: 165ps, 165ps, 165ps, 165ps. (fastest speed
grades, both companies).

Then there is the interconnect. V4 is 500 ps faster for full chip
routes, 400 ps faster for 1/2 chip routes, 100-200 ps faster for a few
CLBs, LABs, and 100-200ps for neighbor routes. Some very short routes
are 30ps better in S2.

Below 32 bits, S2 is slightly better for an adder, and over 32 bits, V4
is better. Same for cary chain, where S2 is ~ 200 ps better at ~ 16
bits, and V4 is >500ps better at 48 bits, and longer carry chains (equal
at 24 bits).

In our suite of test designs, we come out ~9% faster (on average) with a
+/- 4% error margin. Of course some designs will be faster than that,
and some slower, too. We generally favor wider arithemetic, and
pipelining, where S2 favors empty designs, and small arithemetic
functions. We tend to excell when the design gets full, and complex
(like it does at the end of your project!).

BRAM functionality depends a lot on the use of registers, as use of the
fabric registers really slows things down (and takes more power) than
using the registers built into the BRAM. Of course, anythign you can
direct into the DSP48s will just scream, and outperform anything S2 has.

I think that the newsgroup here will basically tell you to try a design
in both architectures, and play with the constraints to see how well it
does.

Or, what I prefer, is to contact the FAEs of the respective companies,
and ask them to show you how your design will perform (let them drive
the tools).

Or, do both.

Austin

Reply With Quote
  #5 (permalink)  
Old 05-13-2005, 01:29 AM
Tommy Thorn
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Austin Lesea wrote:
> Joseph,
>
> I just saw a presentation that shows that V4 is faster on all
> interconnet paths (by as much as 500 ps for long paths) except the
> immediate neighbor paths, where we are just ever to slightly slower than
> S2 neighbor paths.


....(lots of numbers deleted)...

Without detailing what you're comparing (ie., which device at which
speed grade) none of this is meaningful.

Tommy -- not affiliated with either fighting bulls.
Reply With Quote
  #6 (permalink)  
Old 05-13-2005, 03:40 AM
austin
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Tommy,

I thought I was clear, fastest speed grade, S2 and V4.

Austin

Tommy Thorn wrote:

> Austin Lesea wrote:
>
>> Joseph,
>>
>> I just saw a presentation that shows that V4 is faster on all
>> interconnet paths (by as much as 500 ps for long paths) except the
>> immediate neighbor paths, where we are just ever to slightly slower
>> than S2 neighbor paths.

>
>
> ...(lots of numbers deleted)...
>
> Without detailing what you're comparing (ie., which device at which
> speed grade) none of this is meaningful.
>
> Tommy -- not affiliated with either fighting bulls.

Reply With Quote
  #7 (permalink)  
Old 05-13-2005, 05:34 AM
Jim Granville
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Austin Lesea wrote:
<snip>
> For example: S2 4LUT input delays to output (in order): 155ps, 382ps,
> 360ps, 275ps. V4 4LUT: 165ps, 165ps, 165ps, 165ps. (fastest speed
> grades, both companies).


Since this is side-by-side, I was wondering why Xilinx spec all paths
the same.

Is that actually the worst path, and then the SW is
free to use any path ?
[but your physical speed margin might change, on a re-route]

Or is there really such a difference in the implementation that Xilinx's
end up precisely identical, and Altera's vary over 2:1 ?

-jg

Reply With Quote
  #8 (permalink)  
Old 05-13-2005, 03:46 PM
John M
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Joesph,

I agree with Ben. With so many variables and so much marketing B.S.,
your best bet is to compile using both a V4 and SII. I've found that
performance is highly dependent on implementation, synthesis tools, and
how full the device is. These are all variables outside of your FPGA
vendor selection. You also note that you're probably going with the
slowest speed grade, so I assume cost is an issue. A true comparison
cannot be made with cost included. In addition, you should also
consider whether EasyPath for Xilinx or Hardcopy for Altera are
alternatives to help lower your cost. Finally, I would like to make
one point about interconnect. Who cares if V4 or SII is slightly
faster? It's the routing software that is going to make the major
difference. Whichever software requires me to do the least amount of
floorplanning is the one that wins. Also, how well does the software
perform as the chip gets full? Personally, I think the floorplanning
tools of ISE are easier to use than Quartus. However, I think Quartus
does a much better job at placement and routing as a design gets very
full (>90% utilization).

John

Reply With Quote
  #9 (permalink)  
Old 05-13-2005, 04:50 PM
Austin Lesea
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...fabric only thread...LUT details...

Jim,

I have been corrected by many. No, they are not all the same (in the
hardware, and as an IC designer, I already knew that). However, in the
past they were treated as all equal (for efficiency, finding and using
the faster path is not necessarily a big benefit).

I do not know if the paths are treated the same or not (on the 4LUT) in
V4 p&r. I am sure someone will tell me (now).

I think the point I was trying to make is that the 4LUT is faster than
the ALM for a class of functions (4 inputs or less), and slower for
wider functions (on some pins). So, the quality of the synthesis,
followed by the place and route (constraints) will make a huge
difference in the performance.

I have been told that for every design that is better in S2, after some
work, can be made even better than S2 in V4. I do not doubt that Altera
can, and does, make the exact same claim.

I disagree that the ultimate (best) performance in S2 is better, as that
is not what our research has shown. Again, Altera has their own suite
of XX designs that they use to benchmark their device, and they also
make exactly the same claim.

Given the state of the marketing wars (see the "mine is...." thread), I
think I'll stay safely in the engineering camp, and say: if you are
really adamant about comparing the two, go take your finished design,
and run it through both design tools, and make your own decision. Our
FAEs are available to help you with that chore.

And please take into account that we offer: DSP48, EMAC, PPC, FIFO-BRAM
that can be used to even greater advantage.

Austin
Reply With Quote
  #10 (permalink)  
Old 05-13-2005, 06:07 PM
Rudolf Usselmann
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...fabric only thread...LUT details...

Austin Lesea wrote:

> Jim,
>

....
>
> I disagree that the ultimate (best) performance in S2 is better, as that
> is not what our research has shown. Again, Altera has their own suite
> of XX designs that they use to benchmark their device, and they also
> make exactly the same claim.


Austin,

to settle this argument once and for all, why not take a bunch
of designs that are freely available on OpenCores, and present
utilization and performance reports without doing any tweaking
of the designs ? There are many VHDL and Verilog deigns available
on OpenCores from CPUs, to Crypto cores to communication cores.

Both companies could present their own results including with
a script as to how to reproduce the results, in case somebody
wanted to double check.

If you could agree to do this fir Xilinx, and perhaps we ghet a
volunteer from the Altera Camp, we can openly chose some designs ...

Best Regards,
rudi
================================================== ===========
Rudolf Usselmann, ASICS World Services, http://www.asics.ws
Your Partner for IP Cores, Design, Verification and Synthesis
Reply With Quote
  #11 (permalink)  
Old 05-13-2005, 06:34 PM
Peter Alfke
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...fabric only thread...LUT details...

Rudi, nice idea, but it won't work, with the two companies involved.
Many years ago, there was PREP, with a very similar idea. It died
because the FPGA manufacturers could not resist the temptation to
tinker with the results ( I used the words "lied and cheated"). Our
"friends" presented designs with "virtual" flip-flops, to improve the
packing density. It became one big shouting match.

The stakes are just too high for either of the marketing departments to
admit "defeat", and there are too many subtle aspects of designing with
FPGAs, hardware and software.
"Everybody is the winner" will be the unavoidable outcome.

It seems that the user community likes the intense competition and
diversity.
And we like the fact that FPGAs have not become a commodity where price
is the only differentiator. There is still lots of room for creativity
and innovation.
Peter Alfke

Reply With Quote
  #12 (permalink)  
Old 05-13-2005, 06:34 PM
Nicholas Weaver
Guest
 
Posts: n/a
Default Stupid Question on the Urination Contest... Re: V4 vs. Stratix-II...

Warning: Ranty, opinionated (and quite probably wrong):


Stupid question on the X vs A urination (in a hurricane) contest:

How much does performance really matter?

First, how many FPGA tasks are not defined by an external clock or
clocks? If you are doing GigE, your clock is 125 MHz (8 bit path) or
62.5 MHz (16 bit path). The PCI-X bus is 33/66/99/133 MHz.

Second, how many designs have single-cycle latency requirements? PCI
does, but your part either can or can't make the PCI spec with the
provided IP core (so thats a pass/fail metric, not a performance
metric).

If the task is latency bound overall, then performance matters. But
otherwise, just add more registers & pipeline more finely.

Thus I personally wonder whether the primary focus of the pissin match
should be mostly about tools (both the vendor tools and support for
third party tools, especially easy floorplanning, datapath aware
placement, & retiming), density ($/LE), and features (Brand X has a
big lead here), rather than who's lut is 10% faster on what functions,
and who's interconnect might be slightly faster on some designs and
slower on others.
--
Nicholas C. Weaver. to reply email to "nweaver" at the domain
icsi.berkeley.edu
Reply With Quote
  #13 (permalink)  
Old 05-13-2005, 07:01 PM
Antti Lukats
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...fabric only thread...LUT details...

"Rudolf Usselmann" <[email protected]> schrieb im Newsbeitrag
news:[email protected]
> Austin Lesea wrote:
>
> > Jim,
> >

> ...
> >
> > I disagree that the ultimate (best) performance in S2 is better, as that
> > is not what our research has shown. Again, Altera has their own suite
> > of XX designs that they use to benchmark their device, and they also
> > make exactly the same claim.

>
> Austin,
>
> to settle this argument once and for all, why not take a bunch
> of designs that are freely available on OpenCores, and present
> utilization and performance reports without doing any tweaking
> of the designs ? There are many VHDL and Verilog deigns available
> on OpenCores from CPUs, to Crypto cores to communication cores.
>
> Both companies could present their own results including with
> a script as to how to reproduce the results, in case somebody
> wanted to double check.
>
> If you could agree to do this fir Xilinx, and perhaps we ghet a
> volunteer from the Altera Camp, we can openly chose some designs ...
>
> Best Regards,
> rudi


Rudi,

it would not work that way and you get nil support to the idea (officially
at least) from any FPGA vendor. There is just too much on the stake. But
some companies are doing something similar by having test environment which
are run agains the latest tools for multi FPGA vendors. Those are the
companies that design FPGA/ASIC tools. And to my knowledge most of those
companies are pissed to FPGA companies because ah their bread is getting
less as the FPGA vendor tools are getting better (or including new
functionality in it) and I think there are some other problems also. Anyway
those companies run testbenches. For a little different reason, but I think
they pretty much 'see' and 'know' the differencies between the FPGA fabrics
from different vendors. But all that benchmarking is strictly inside those
companies and there is no public info. The 'fpga' benchmarking in open, has
failed. It is virtually not possible to be done wihout some kind of biasing
and the results are not useable without very strict explanatians under what
circumstances the compare results are valid. The hdl to fabric mapping is
too complex (the all process) and there are too many small things that may
or may not have impact on the results.

Antti
with his last 2 cents











Reply With Quote
  #14 (permalink)  
Old 05-13-2005, 07:47 PM
Austin Lesea
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...fabric only thread...LUT details...

Rudi,

The problem is that without any regard to device specific features, the
results will vary by a tremendous amount.

Austin

Reply With Quote
  #15 (permalink)  
Old 05-13-2005, 08:27 PM
c d saunter
Guest
 
Posts: n/a
Default Re: Stupid Question on the Urination Contest... Re: V4 vs. Stratix-II...

I recently wasted several hours trying to get a project from ISE 6.1i to
*load* correctly in ISE 6.3.03i.

I always seem to loose more time to the tools (either due to bugs or the
tools being downright awfull compared to a software compilation toolflow
for providing easily usefull data etc.) than I do to reaching timing
requirements, or adapting a design to live with a lower clock.

So yes, in my view I'm more interested in toolflows lately, although with
only to fish in the pond hardware wise, and in both cases the hardware and
tools being intematly linked (at lower levels, and at higher levels for
those on restricted budgets), their is sadly little choice :-(

An interesting asside, I've been getting involved with work in using FPGAs
in high performance computing, coming from a background in both. Meeting
people coming from a background that is software/HPC and no FPGAs, they
tend to be appauled by the FPGA software flows.

I'm reasonably convinced that a 'proper' implementation of the modular
design stuff from Xilinx (i.e. not relying on using the disapearing
tristate bus emulation from the Virtex architecture) would make the tools
more usable, not least in reducing raw hardware and time requirements for
big PARs.

Mind you it could be argued that the reason I have the luxury to be pissed
at the tools is because the hardware is now at a state where it does what
I want most of the time :-)

---

cds


Nicholas Weaver ([email protected]) wrote:
: Warning: Ranty, opinionated (and quite probably wrong):


: Stupid question on the X vs A urination (in a hurricane) contest:

: How much does performance really matter?

: First, how many FPGA tasks are not defined by an external clock or
: clocks? If you are doing GigE, your clock is 125 MHz (8 bit path) or
: 62.5 MHz (16 bit path). The PCI-X bus is 33/66/99/133 MHz.

: Second, how many designs have single-cycle latency requirements? PCI
: does, but your part either can or can't make the PCI spec with the
: provided IP core (so thats a pass/fail metric, not a performance
: metric).

: If the task is latency bound overall, then performance matters. But
: otherwise, just add more registers & pipeline more finely.

: Thus I personally wonder whether the primary focus of the pissin match
: should be mostly about tools (both the vendor tools and support for
: third party tools, especially easy floorplanning, datapath aware
: placement, & retiming), density ($/LE), and features (Brand X has a
: big lead here), rather than who's lut is 10% faster on what functions,
: and who's interconnect might be slightly faster on some designs and
: slower on others.
: --
: Nicholas C. Weaver. to reply email to "nweaver" at the domain
: icsi.berkeley.edu
Reply With Quote
  #16 (permalink)  
Old 05-13-2005, 10:40 PM
Joseph H Allen
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Thanks you all. This has been very helpful.


--
/* [email protected] (192.74.137.5) */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0%79-77?1:0<1659?79:0>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
Reply With Quote
  #17 (permalink)  
Old 05-14-2005, 01:08 PM
Simon Peacock
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

I think you are somewhat missing the point with the A & X question.. in
that you ask the wrong question.

Its not who has the best architecture or which one is fastest.. it actually
doesn't really matter... for 99% of the designs, as Austin's pointed out
before... either is good enough... and if your in the 1% that matters, then
anything that you do won't give you a good enough idea until you try and fit
the final FF or CLB, and even then your design will be so customised that an
A design is almost impossible to translate to X and visa versa.

What really matters is what price X or A's FAE will sell you the parts at,
what support they will give you, what evaluation boards are about that do
some if not all your needs.

The decision at my work was which company gave us the best discount, That
happened to be Xilinx. It also happened that they do bus LVDS which we are
using so our design naturally forced A out anyway, we just didn't tell
anyone :-)

If you are building a one off then it really doesn't matter anyway. Use a
dartboard and a blindfold it will be as accurate as a detailed study... for
one off.. just choose a eval board with a largish device, get it all working
and see how big it is, then choose a device twice the size required (for the
inevitable fixups)

my two cents

Simon


"Joseph H Allen" <[email protected]> wrote in message
news:[email protected]
> Thanks you all. This has been very helpful.
>
>
> --
> /* [email protected] (192.74.137.5) */ /* Joseph H.

Allen */
> int

a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--q=3&(r=time(0)
>

+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0%79-77?1:0<1659?79:0>158?-79:0,q?!a[p+
q*2
> ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817printf(q%79?"%c":"%c\n","

#"[!a[q-1]]);}


Reply With Quote
  #18 (permalink)  
Old 05-15-2005, 10:50 AM
Jim Granville
Guest
 
Posts: n/a
Default Re: Stupid Question on the Urination Contest... Re: V4 vs. Stratix-II...

Nicholas Weaver wrote:
<snip>
> Thus I personally wonder whether the primary focus of the pissin match
> should be mostly about tools (both the vendor tools and support for
> third party tools, especially easy floorplanning, datapath aware
> placement, & retiming), density ($/LE), and features (Brand X has a
> big lead here), rather than who's lut is 10% faster on what functions,
> and who's interconnect might be slightly faster on some designs and
> slower on others.


Or the number of user designs that broke on the latest Vxyz release ?

-jg

Reply With Quote
  #19 (permalink)  
Old 05-17-2005, 10:31 AM
Paul Leventis \(at home\)
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Hi Joseph,

First, I must stress that comparing "micro parameters" is difficult at best
and dangerous at worst. There are fairly arbitrary decisions made during
timing modeling about where you lump various delays. For example, where
does "LUT delay" begin and end -- is it at the output of the 1st stage
buffer after the multiplexor before the LUT? Or is that multiplexor's delay
included as part of LUT delay?

The Stratix/Stratix II/Cyclone/Cyclone II/Max II timing models are
sufficiently complicated that there is little point to making datasheet
entries for various internal timing parameters. For example, the ALM is
fairly complicated and depending on how your logic is synthesized and
exactly how the router chooses to hook it up, your delay can vary
considerably. So your best bet is to look at real circuits with real timing
constraints, since Quartus II will do its best to put the critical signals
on the fastest paths. That said...

As some posters have already pointed out, RAM speeds have increased in
Quartus 5.0. The latest comparison I've seen shows us with a Tco advantage
vs. Virtex-4 when the RAM output registers are used, and a slight
disadvantage when the RAM is unregistered -- in either case a few hundred ps
difference.

As for LUT delays, here are the latest numbers I've got for a fastest speed
grade 7-input LUT (ALM can do some inputs of 7-inputs, and all functions of
6-inputs), as well as for a 4-LUT (the ALM can do two independent 4-LUTs).

Input 7-LUT 4-LUT
A 378 ps 366 ps
B 357 ps 228 ps
C 240 ps 225 ps
D 240 ps 53 ps
E 144 ps
F 53 ps
G 234 ps

According to Austin's post, Virtex-4 (fastest speed grade -- I dare you to
try to buy one ;-)) shows 165 ps across-the-board (seems bogus to me, but
what do I know). So which LUT is faster based on this data? Well, it
depends on how we lumped our delays into logic vs. routing (see above). It
also depends on how often Quartus II will manage to route your critical
signal on the fast LUT inputs -- usually it does a very good job of this.

The other critical component for logic fabric performance is the routing.
Based on an analysis of routing delay between registers placed a varying
distance apart in the X- and Y-directions, we've found that we have a ~20%
delay advantage (fastest speed grade vs. fastest speed grade). Of course,
even this type of study has its caveats -- how do you normalize distance to
take into account differences in logic density?

Stratix II employs a low-k inter-metal diaelectric (k = 2.9) vs. Virtex-4's
"reduced-k" diaelectric (k = 3.6), given us a ~20% metal capacitance
advantage. If you set aside architectural and circuit differences, to first
order you'd expect this to translate into a performance advantage for
Stratix II.

Regards,

Paul Leventis
Altera Corp.


Reply With Quote
  #20 (permalink)  
Old 05-17-2005, 10:45 AM
Paul Leventis \(at home\)
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

> Then there is the interconnect. V4 is 500 ps faster for full chip routes,
> 400 ps faster for 1/2 chip routes, 100-200 ps faster for a few CLBs, LABs,
> and 100-200ps for neighbor routes. Some very short routes are 30ps better
> in S2.


I would guess that you did not normalize to take into account packing
density. How do you define a "short" route? Do you multiply the # of CLBs
and # of LABs by the right ratio of logic? I'd argue that 1 LAB = 8 ALMs =
~10-10.5 slices (based on our density analysis).

Anyway, the average distance of a hop in a critical path is roughly 3 LABs,
so short connections are the most important. Our data shows a performance
advantage in hops of this length.

> Of course, anythign you can direct into the DSP48s will just scream, and
> outperform anything S2 has.


That's interesting... did you miss the news that we've increased Stratix II
DSP performance to 550 Mhz in Quartus II 5.0? Not to mention that the S2
DSP can do 36-bit multiplies in hardware (vs. 18-bit for DSP48)... but I
will not digress into a feature pissing contest.

> I think that the newsgroup here will basically tell you to try a design in
> both architectures, and play with the constraints to see how well it does.


On this, I agree with Austin. Kick the tires. Just be sure to set timing
constraints before doing so, and also make sure not use "toy" designs
(neither tool is particularly well optimized for very small designs in very
large chips). And beware numerical noise -- placement & routing is a
heuristic. If you perturb any aspect of the input, the output can change
due to random differences in algorithm outcome.

Regards,

Paul Leventis
Altera Corp.


Reply With Quote
  #21 (permalink)  
Old 05-17-2005, 10:58 AM
Paul Leventis \(at home\)
Guest
 
Posts: n/a
Default Re: Stupid Question on the Urination Contest... Re: V4 vs. Stratix-II...

> Warning: Ranty, opinionated (and quite probably wrong):

Those are the best kind of kind of posts...

> How much does performance really matter?


You make good points. If you need 66, what does it matter if you get 70 vs.
75? The problem is at the time most customers select a part, they do not
have a complete (or even partial) design. You know your Mhz requirement,
but have no idea if you will hit it. If you select a faster part, you are
more *likely* to hit your Fmax target. How much more likely? Its hard to
say.

But consider the downside to missing performance. At best, you have to push
the tools, or floorplan, or re-pipeline, or restructure your HDL. At worst,
you need to respin your board, select a new product, maybe get a faster
speedgrade, or change other aspects of your system design to accomodate a
lower clock speed. All of this costs time, and time-to-market is one of the
big FPGA sales points.

Not all clock domains are defined by external requirements. Sometimes the
faster you can run your core, the better the performance of your system
(example -- graphics processor) even though your bus and memory speeds are
still the same. Also, if you get fast enough in your internal clock
domains, you might be able to cut the data width or multiplicity of your
internal logic, allowing you to migrate into a smaller (and thus cheaper)
part.

In my mind, speed matters most as a time-saving feature. If the CAD tools
and chip you are using enable you to hit your performance requirements using
plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved
yourself a bundle of hurt. Its interesting -- we see the results of having
fast chips and good out-of-the-box software performance, as these features
translate into lower support costs of the "I need help meeting my timing"
variety.

Having speed is not enough. We have to have the features you need, but not
too many as to exceed your cost requirements (is a feature that costs 3% but
is only used by 1% of designers worth it?). Our software and support have
to be up to your needs. And so on. But that won't stop us from discussing
speed, or power, or SI in isolation of these other design requirements.

Regards,

Paul Leventis
Altera Corp.


Reply With Quote
  #22 (permalink)  
Old 05-17-2005, 04:42 PM
Mike Treseler
Guest
 
Posts: n/a
Default Re: Stupid Question on the Urination Contest... Re: V4 vs. Stratix-II...

Paul Leventis (at home) wrote:

> You make good points. If you need 66, what does it matter if you get 70 vs.
> 75? The problem is at the time most customers select a part, they do not
> have a complete (or even partial) design.


True. Picking an FPGA footprint can defer to
the simulation of its contents.

Making a board used to be a high risk, critical path,
long lead time task. Four or five spins was the norm.
The fpga was a detail and getting the software guys
something to play with was the priority.

Today boards have fewer parts with more balls,
and making a board is not such a big deal.
There is little reason to make a quick board
for the software guys because all the interesting
registers are in the fpga. HDL simulation is
a critical path item.

> In my mind, speed matters most as a time-saving feature. If the CAD tools
> and chip you are using enable you to hit your performance requirements using
> plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved
> yourself a bundle of hurt.


Amen to that.


-- Mike Treseler
Reply With Quote
  #23 (permalink)  
Old 05-17-2005, 04:57 PM
Austin Lesea
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Paul,

Yes, you can get the fastest speed grade. Really a cheap shot, that
one. I sense some real desperation.

And, stop with the low-K dielectric. All of the Toshiba parts are low
K. Guess what? We do not speed grade or power grade them differently,
because it just doesn't make that much of a difference!

Perhaps an ASIC can take proper advantage of low K, but the FPGAs just
do not show much of an improvement at all.

And stop with the power "advantages of S2."

The Japanese engineer who touched the S2 and V4 chips on our
demonstrator said it all: "S2 hot! V4 cool..."

Austin
Reply With Quote
  #24 (permalink)  
Old 05-17-2005, 06:56 PM
Paul Leventis
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

> And, stop with the low-K dielectric. All of the Toshiba parts are
low
> K. Guess what? We do not speed grade or power grade them

differently,
> because it just doesn't make that much of a difference!


So the long delay in getting the -12 speed grade out had nothing to do
with this fab transition? It must be fun characterizing one product
produced in two fabs with two different processes (one low-k, one not,
and who knows what else is different).

> Perhaps an ASIC can take proper advantage of low K, but the FPGAs

just
> do not show much of an improvement at all.


I wish we had this "defie the laws of physics" technology you use on
Virtex-4. First you claim your devices do not draw more current with
increased voltage. Then you claim that increased metal capacitance has
no impact on speed or power. I'm waiting for you to claim that I/O pin
capacitance doesn't matter for performance, signal integrity or
power...

> The Japanese engineer who touched the S2 and V4 chips on our
> demonstrator said it all: "S2 hot! V4 cool..."


A very scientific test! Let's do some quick math here... Even if you
found some demo with a 1W VccInt difference, this should only translate
to ~10 C difference in chip temperature (still air, no heat sink on
2S60 --> Theta-JA = 10.4 C/W), which would hardly be discernable to the
touch. Why was this demo so much hotter to the touch then? My
educated guess (based on the analysis of one of our customers) is that
you had unequal I/O settings, causing lots more I/O dissipation in our
chip. Really, that is rather low.

Regards,

Paul Leventis
Altera Corp.

Reply With Quote
  #25 (permalink)  
Old 05-17-2005, 07:34 PM
Austin Lesea
Guest
 
Posts: n/a
Default Re: V4 vs. Stratix-II...

Paul,

I am sure the newsgroup is getting really bored with this. I certainly
am. Short and sweet:

Two fabs: It is a challenge, but then having two qualified sources of
supply is a definite advantage for our customers.

Low-K: Don't get me wrong, I like low K, I like low pin capacitance too.
I also like fine wine, and a good meal. I had already asked you to
fab the S2 without low-K and measure it. We did that for V2 and V2P,
and again for V4 at Toshiba and UMC. We know. You guess.

Low power: What is low, is our power dissipation. The static leakage
kills you folks as the part gets hot. And what FPGA in the high end
isn't running hot? Yours just run even hotter due to the leakage (or
require more expensive heatsink solutions). This one is so easy to
prove it is silly for you to even try to compete on total power.

Austin
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Used Stratix II FPGA's [email protected] Verilog 2 08-13-2007 06:36 PM
Stratix & PLL Krzysztof Szczepanski FPGA 1 11-15-2003 09:27 PM


All times are GMT +1. The time now is 06:34 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved