View Single Post
  #4 (permalink)  
Old 01-04-2006, 02:52 PM
Ray Andraka
Guest
 
Posts: n/a
Default Re: Remapping from Virtex-II to Virtex-4

Lars wrote:
> Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I
> wanted to do this to see how the new technology performed, mainly to
> see if it was worth the trouble to upgrade some existing designs. We
> did this quite successfully some years back, stepping from Virtex-E to
> Virtex-II. The main obstacle then was the new size Block RAM going from
> 4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components
> untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in
> the chip was sufficient, all we had to do was to update the
> LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed
> to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant!


FOr the most part, a VirtexII design can be pretty much dropped into a
virtex 4. You hit on one of the places you will have trouble: the slice
M/slice L thing. The V4 CLB structure is substantially similar to the
V2 structure except only even columns have the logic for LUT ram. Thus
if you have an RPM with SRL16's or RAM16's placed in it, those have to
go in even columns. There is also a bug in the mapper that causes
problems if an RPM macro with memory elements straddles a BRAM or DSP
column such it thinks that that any memory elements to the right of the
DSP/BRAM column are in the wrong type of column even if they aren't.
The work-around is to break the RPM up into smaller sub-RPMs that fit
between the BRAM/DSP columns.

The other place you will have difficulty is if you have instantiated
MULT18x18 primitives in the design, as these have to be converted to
DSP48's. With only one register like the Mult18x18s, you will be
disappointed with the performance, but it will work with a 1:1 replacement.

OK, so paying attention to these two issues will get your design into a
Virtex4, but you won't reap the full benefit. You'll find the fabric
carry chains are not any faster than the same speed grade (and in some
cases are actually slower) V2. Also, the clock to output times on the
BRAM without an added output register and unpipelined multiplier are not
any faster. To get the performance promised, you need to turn on the
pipelining in these elements so that the multiplier has a 3 clock
pipeline (input, middle and output registers) and the BRAM a 2 clock
pipeline (there is an added output register).

The big gains in V4 for signal processing type stuff are had with the
DSP 48 slice's adder, which is quite a bit faster than the fabric carry
chains. Unfortunately, using it is basically a clean sheet redesign
because you also need to use the pipeline registers there to get the speed.

So in short, you can put your V2 design into V4 without a lot of effort,
but you will likely be disappointed when it doesn't run any faster. In
order to get the speed advantages, you need to redesign to the architecture.
Reply With Quote