FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 09-20-2006, 03:48 AM
rickman
Guest
 
Posts: n/a
Default Old vs. New FPGAs

I was updating a CPU design I did a few years ago and I was a bit
disappointed in the results I see. The CPU was originally targeted to
an Altera ACEX part which is 5 volt compatible (to give you an idea of
its age). I did my own CPU because Altera does not support their NIOS
for that family. I spent a fair amount of time optimizing the
architecture to be easy to implement in 4 input LUTs and other basic
elements found in FPGAs. I coded it up for the ACEX async memories and
got it running. If memory serves me, it clocked in at 55 MHz max and I
used it at 40 MHz.

Currently I wanted to look at how fast it might run if I redid it for a
current FPGA architecture using synchronous memories. I compiled it
for a Spartan 3 and got the speed up to 77 MHz using less than 10% of
an XC3S400 (315 slices). I am not impressed with the speed. I
expected a much larger increase and had hoped for operation at over 100
MHz. I checked the timing analyzer output and the signal paths are
pretty much what I expected, no oddball logic generation and I got
carry chains where I wanted them. The slow paths have a few long route
times, so although it may approach 100 MHz with careful floorplanning,
I don't think this is worth the effort compared to the >> 100 MHz CPU
cores you can get from the FPGA vendors.

I was wondering if this small speed up is typical of improvements from
one or two generations difference in FPGAs? The ACEX parts are
designed for economy, not for speed, just like the Spartans. When I
did the initial design 3 or 4 years ago, the ACEX parts were old news
then! Given that there was nothing in the design that is tailored for
one FPGA family over another, I guess I expected more like a 2X speedup
in the current technology chip. Isn't that reasonable given the vast
difference in the timing specs in the data sheets?

Reply With Quote
  #2 (permalink)  
Old 09-20-2006, 05:53 AM
Alan Nishioka
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

rickman wrote:
> Currently I wanted to look at how fast it might run if I redid it for a
> current FPGA architecture using synchronous memories. I compiled it
> for a Spartan 3 and got the speed up to 77 MHz using less than 10% of
> an XC3S400 (315 slices). I am not impressed with the speed. I
> expected a much larger increase and had hoped for operation at over 100
> MHz. I checked the timing analyzer output and the signal paths are
> pretty much what I expected, no oddball logic generation and I got
> carry chains where I wanted them. The slow paths have a few long route
> times, so although it may approach 100 MHz with careful floorplanning,
> I don't think this is worth the effort compared to the >> 100 MHz CPU
> cores you can get from the FPGA vendors.


This does not surprise me. Xilinx seems to have emphasized size over
speed of Spartan as they update it. It is very difficult to get
Microblaze to run at 100MHz in a Spartan 3E, so 77MHz without trying is
about what I would expect.

Alan Nishioka

Reply With Quote
  #3 (permalink)  
Old 09-20-2006, 06:24 PM
Derek Simmons
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

I'm just curious and it might not be applicable to application but did
you try targeting a Stratix II? If you did what kind of fMax's where
you able to achieve?

Derek


rickman wrote:
> I was updating a CPU design I did a few years ago and I was a bit
> disappointed in the results I see. The CPU was originally targeted to
> an Altera ACEX part which is 5 volt compatible (to give you an idea of
> its age). I did my own CPU because Altera does not support their NIOS
> for that family. I spent a fair amount of time optimizing the
> architecture to be easy to implement in 4 input LUTs and other basic
> elements found in FPGAs. I coded it up for the ACEX async memories and
> got it running. If memory serves me, it clocked in at 55 MHz max and I
> used it at 40 MHz.
>
> Currently I wanted to look at how fast it might run if I redid it for a
> current FPGA architecture using synchronous memories. I compiled it
> for a Spartan 3 and got the speed up to 77 MHz using less than 10% of
> an XC3S400 (315 slices). I am not impressed with the speed. I
> expected a much larger increase and had hoped for operation at over 100
> MHz. I checked the timing analyzer output and the signal paths are
> pretty much what I expected, no oddball logic generation and I got
> carry chains where I wanted them. The slow paths have a few long route
> times, so although it may approach 100 MHz with careful floorplanning,
> I don't think this is worth the effort compared to the >> 100 MHz CPU
> cores you can get from the FPGA vendors.
>
> I was wondering if this small speed up is typical of improvements from
> one or two generations difference in FPGAs? The ACEX parts are
> designed for economy, not for speed, just like the Spartans. When I
> did the initial design 3 or 4 years ago, the ACEX parts were old news
> then! Given that there was nothing in the design that is tailored for
> one FPGA family over another, I guess I expected more like a 2X speedup
> in the current technology chip. Isn't that reasonable given the vast
> difference in the timing specs in the data sheets?


Reply With Quote
  #4 (permalink)  
Old 09-20-2006, 10:59 PM
Ben Twijnstra
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

Derek Simmons wrote:

> I'm just curious and it might not be applicable to application but did
> you try targeting a Stratix II? If you did what kind of fMax's where
> you able to achieve?


Or just a Cyclone-II - the (currently) latest installment in Altera's
low-cost offerings. If you're not supplying timing constraints, be sure to
take the fitter out of its default Auto Fit mode, or it will simply give
you _a_ possible solution with possibly horrible performance.

Altera is boasting (some) performance advantage over Spartan-3, so here's a
chance to see some real field feedback.

Best regards,



Ben
Reply With Quote
  #5 (permalink)  
Old 09-21-2006, 12:10 AM
radarman
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

Ben Twijnstra wrote:
> Derek Simmons wrote:
>
> > I'm just curious and it might not be applicable to application but did
> > you try targeting a Stratix II? If you did what kind of fMax's where
> > you able to achieve?

>
> Or just a Cyclone-II - the (currently) latest installment in Altera's
> low-cost offerings. If you're not supplying timing constraints, be sure to
> take the fitter out of its default Auto Fit mode, or it will simply give
> you _a_ possible solution with possibly horrible performance.
>
> Altera is boasting (some) performance advantage over Spartan-3, so here's a
> chance to see some real field feedback.
>
> Best regards,
>
>
>
> Ben


If you are concerned with raw speed, it definitely pays to play with
the Quartus fitter settings. I recently completed a redesign of the
vautomation v8 uRISC (now Arclite) 8-bit CPU. I tested it for size by
compiling it alone, and with default fitter settings, achieved about 80
MHz. By pushing the fitter a bit harder, I was able to achieve 113MHz.
Now, using the same settings, I'm getting my entire SoC design to
operate at 80MHz. This was in a cyclone II, and my CPU implementation
uses purposefully generic behavioral VHDL. I could probably have gotten
better results using Altera primitives, but I hate being tied down to a
single vendor. Note, my system frequency was 75MHz, so I pushed the
fitter to give me timing margin for when I added the rest of the logic.

I haven't done as much with the Xilinx ISE, but I would imagine the
situation is pretty similar.

Reply With Quote
  #6 (permalink)  
Old 09-21-2006, 04:40 PM
rickman
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

Alan Nishioka wrote:
> rickman wrote:
> > Currently I wanted to look at how fast it might run if I redid it for a
> > current FPGA architecture using synchronous memories. I compiled it
> > for a Spartan 3 and got the speed up to 77 MHz using less than 10% of
> > an XC3S400 (315 slices). I am not impressed with the speed. I
> > expected a much larger increase and had hoped for operation at over 100
> > MHz. I checked the timing analyzer output and the signal paths are
> > pretty much what I expected, no oddball logic generation and I got
> > carry chains where I wanted them. The slow paths have a few long route
> > times, so although it may approach 100 MHz with careful floorplanning,
> > I don't think this is worth the effort compared to the >> 100 MHz CPU
> > cores you can get from the FPGA vendors.

>
> This does not surprise me. Xilinx seems to have emphasized size over
> speed of Spartan as they update it. It is very difficult to get
> Microblaze to run at 100MHz in a Spartan 3E, so 77MHz without trying is
> about what I would expect.


I see what you mean. I checked the Xilinx site and I was confused
thinking that MB would run at higher speeds. They list 100 MHz in the
-5 high performance versions while I was running my design in the -4
version. So I guess my performance is not so bad considering that it
is not pipelined. Of course with a MISC architecture, it requires more
instructions to do the same amount of work as the instructions are not
as powerful. I may do some other work to see how practical my CPU
design will be in the future. I don't mind doing the leg work to
support an FPGA CPU core, but not if it does not have advantages.
Right now the only advantage is the size, about 600 LUTs vs. 1300 for
MB. I'll need to make sure it will do a decent job of keeping up with
the clock.

Reply With Quote
  #7 (permalink)  
Old 09-23-2006, 04:54 AM
rickman
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

Alan Nishioka wrote:
> rickman wrote:
> > Currently I wanted to look at how fast it might run if I redid it for a
> > current FPGA architecture using synchronous memories. I compiled it
> > for a Spartan 3 and got the speed up to 77 MHz using less than 10% of
> > an XC3S400 (315 slices). I am not impressed with the speed. I
> > expected a much larger increase and had hoped for operation at over 100
> > MHz. I checked the timing analyzer output and the signal paths are
> > pretty much what I expected, no oddball logic generation and I got
> > carry chains where I wanted them. The slow paths have a few long route
> > times, so although it may approach 100 MHz with careful floorplanning,
> > I don't think this is worth the effort compared to the >> 100 MHz CPU
> > cores you can get from the FPGA vendors.

>
> This does not surprise me. Xilinx seems to have emphasized size over
> speed of Spartan as they update it. It is very difficult to get
> Microblaze to run at 100MHz in a Spartan 3E, so 77MHz without trying is
> about what I would expect.


I tried a couple of things, but I was not able to use the floorplanner.
I get a fatal error and it crashes. This may be due to it not being
able to phone home when it tries to reach out and touch someone. My
firewall blocks it and when I click the OK button the floorplanner
crashes.

I get different failing paths depending on some of the settings I make,
like the Starting Placer Cost Table setting. But the long path is
around 13 ns and has about the same amount of logic and routing delay.
Is that normal? These paths all start with a 2 ns clock to out from
the BRAM. Then there are typically two or three routes that are longer
than 1 ns, sometimes one is longer than 2 ns. I can't tell what is
weird about this since I can't really "see" it. This path is only 5
levels of logic with no carry chain. Others are 4 level of LUTs plus a
carry chain (although typically only the last few bits of a 16 bit
adder for some reason).

Timing constraint: TS_SysClk = PERIOD TIMEGRP "SysClk" 10 ns HIGH 50%;

24616 items analyzed, 84 timing errors detected. (84 setup errors, 0
hold errors)
Minimum period is 12.915ns.
--------------------------------------------------------------------------------
Slack: -2.915ns (requirement - (data path - clock path
skew + uncertainty))
Source: InstFtch/Mram_Inst_Ram1.B (RAM)
Destination: RegPsw/DebugIrqEn (FF)
Requirement: 10.000ns
Data Path Delay: 12.914ns (Levels of Logic = 5)
Clock Path Skew: -0.001ns
Source Clock: SysClk rising at 0.000ns
Destination Clock: SysClk rising at 10.000ns
Clock Uncertainty: 0.000ns
Timing Improvement Wizard
Data Path: InstFtch/Mram_Inst_Ram1.B to RegPsw/DebugIrqEn
Delay type Delay(ns) Logical Resource(s)
---------------------------- -------------------
Tbcko 2.394 InstFtch/Mram_Inst_Ram1.B
net (fanout=0) 1.792 InstFtch/InstReg<5>
Tilo 0.608 DecodeSlow/DatStkCntl<1>21
net (fanout=19) 0.758 DecodeSlow/N23
Tilo 0.608 DecodeSlow/FlagsEn<8>11
net (fanout=6) 0.369 DecodeSlow/N56
Tif5x 0.911 DecodeSlow/FlagsEn<8>_F
DecodeSlow/FlagsEn<8>
net (fanout=0) 1.241 DecodeSlow/FlagsEn<8>
Tilo 0.551 RegPsw/_not00141
net (fanout=5) 1.079 RegPsw/_not0014
Tilo 0.608 RegPsw/_not00211
net (fanout=1) 1.393 RegPsw/_not0021
Tceck 0.602 RegPsw/DebugIrqEn
---------------------------- ---------------------------
Total 12.914ns (6.282ns logic, 6.632ns route)
(48.6% logic, 51.4% route)

Is this normal for the routing delays to range so widly and total as
long as the logic delays?

This is with nothing else in the chip, so I can only imagine that the
path delays will get longer as I combine other logic inside the chip.

I'll give it a try in a Virtex4 part over the weekend and see if that
is faster.

Reply With Quote
  #8 (permalink)  
Old 09-23-2006, 02:02 PM
rickman
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

Here are a couple more data points. I changed the part to an
xc4vlx25-12 and it exceeded the 100 MHz timing requirement, in fact it
ran at 110 MHz. But at -10 it failed only reaching 84 MHz. On the
other hand the XC3S400-5 weighed in at almost 91 MHz. So speed grade
can make a moderate difference.

The thing that surprised me the most is that in the Spartan 3 parts the
routing was about half the delay in the worst case paths. But in the
V4 part routing was over 70% of the delay in the worst case paths! So
the LUTs got faster between S3 and V4, but not the routing! In fact,
the routing delays were longer in absolute terms, but I'm not sure this
was a valid comparison as the longest delays were on different nets
between the two parts.

I also found a bug in the IDE. When you change parts to evaluate
differences, the Summary Report does not change the Target Device. All
the other info seems to be correct, but the target stayed the same no
matter what I did.

Reply With Quote
  #9 (permalink)  
Old 09-23-2006, 03:05 PM
Antti
Guest
 
Posts: n/a
Default Re: Old vs. New FPGAs

rickman schrieb:

> Here are a couple more data points. I changed the part to an
> xc4vlx25-12 and it exceeded the 100 MHz timing requirement, in fact it
> ran at 110 MHz. But at -10 it failed only reaching 84 MHz. On the
> other hand the XC3S400-5 weighed in at almost 91 MHz. So speed grade
> can make a moderate difference.
>
> The thing that surprised me the most is that in the Spartan 3 parts the
> routing was about half the delay in the worst case paths. But in the
> V4 part routing was over 70% of the delay in the worst case paths! So
> the LUTs got faster between S3 and V4, but not the routing! In fact,
> the routing delays were longer in absolute terms, but I'm not sure this
> was a valid comparison as the longest delays were on different nets
> between the two parts.
>
> I also found a bug in the IDE. When you change parts to evaluate
> differences, the Summary Report does not change the Target Device. All
> the other info seems to be correct, but the target stayed the same no
> matter what I did.


you are correct - the main difference between S3 and V4 is the LUT
delay
in the matter of fact the LUT delay is really really small in V4, when
I made
measurements to check this delay I wasnt to belive at first but then
looked
at datasheet timings and it was all correlating.
I got signals up to 975MHz within slowest V4,
while in S3 I think I did not get to around 370Mz only.

so the routing really matters!

Antti

Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
FPGAs: Where will they go? lovesinghal FPGA 21 06-23-2005 07:32 AM
[ANN] DSP for FPGAs 4-Day Course Ken FPGA 0 05-04-2004 11:54 AM
Why not DDR in FPGAs? itsme FPGA 4 07-05-2003 04:29 PM


All times are GMT +1. The time now is 01:18 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved