FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-03-2004, 01:23 PM
colin
Guest
 
Posts: n/a
Default making an fpga hot

Guys

We have just laid out a board and want to put the thermal analysis to
bed (it's conduction cooled so not much room for error). If the xilinx
estimator says we are going to use 25 watts does anyone know the best
way to code an FPGA so that it will get nice and hot.

The estimator is just that, but is there a more accurate way of
writing some code so that a particular clock input will generate a
particular amount of heat. A 2000 D type serial chain where every flip
flop is toggling every clock which blinks an LED is obviously one way
but doesn't seem very ellegant.

We have wired up the internal temp sense diode to take a look at the
result (and yes we know how noisy and innacurate they are).

Any experiences?

Colin
Reply With Quote
  #2 (permalink)  
Old 12-03-2004, 05:07 PM
Marc Randolph
Guest
 
Posts: n/a
Default Re: making an fpga hot

colin wrote:
> Guys
>
> We have just laid out a board and want to put the thermal analysis to
> bed (it's conduction cooled so not much room for error). If the xilinx
> estimator says we are going to use 25 watts does anyone know the best
> way to code an FPGA so that it will get nice and hot.
>
> The estimator is just that, but is there a more accurate way of
> writing some code so that a particular clock input will generate a
> particular amount of heat. A 2000 D type serial chain where every flip
> flop is toggling every clock which blinks an LED is obviously one way
> but doesn't seem very ellegant.
>


If your goal is just to generate heat, use all the LUTs as SRL's, make
use of all the BRAM's, and drive all the I/O's with a nice high current
drive strength.

Marc
Reply With Quote
  #3 (permalink)  
Old 12-03-2004, 05:12 PM
Austin Lesea
Guest
 
Posts: n/a
Default Re: making an fpga hot

Coiln,

Just make a huge shift register, or all DFF's toggling, and then just
vary the clock input (or the shifted data input pattern from ....000001,
to 101010....etc).

That is what we do.

Austin

colin wrote:

> Guys
>
> We have just laid out a board and want to put the thermal analysis to
> bed (it's conduction cooled so not much room for error). If the xilinx
> estimator says we are going to use 25 watts does anyone know the best
> way to code an FPGA so that it will get nice and hot.
>
> The estimator is just that, but is there a more accurate way of
> writing some code so that a particular clock input will generate a
> particular amount of heat. A 2000 D type serial chain where every flip
> flop is toggling every clock which blinks an LED is obviously one way
> but doesn't seem very ellegant.
>
> We have wired up the internal temp sense diode to take a look at the
> result (and yes we know how noisy and innacurate they are).
>
> Any experiences?
>
> Colin

Reply With Quote
  #4 (permalink)  
Old 12-03-2004, 07:36 PM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

"colin" <[email protected]> wrote in message
news:[email protected] om...
> We have wired up the internal temp sense diode to take a look at the
> result (and yes we know how noisy and innacurate they are).
>
> Any experiences?
>

Well, I've found the diode isn't particularly noisy nor especially
inaccurate! It gives repeatable and consistent (between parts) results,
certainly good enough for your application. You have routed its connections
together and away from big switching currents, I presume?!
I use copper sheet to move heat to where I can get rid of it. Cu is 400
W/m/K, about twice as good as Aluminium. Don't use copper alloys. Very
useful if you've got boards stacked closely together, you can get the heat
out from between the boards. I've never tried heat pipes, but they're meant
to be very good indeed.
Finally, you'll find that the FPGAs work at elevated temperature for a long
time. I recall a thread on CAF all about FPGAs down boreholes where they
were running for weeks at 175C. You might be enlightened by a quick trawl of
CAF in Google Groups. So, what's the lifetime of your product? How long will
you be working for that company? All part of the engineering compromise!!
Good luck, Syms.


Reply With Quote
  #5 (permalink)  
Old 12-03-2004, 10:44 PM
Mikeandmax
Guest
 
Posts: n/a
Default Re: making an fpga hot

>So, what's the lifetime of your product? How long will
>you be working for that company? All part of the engineering compromise!!


ROFL !!

thanx for the chuckle -
Mike T
Reply With Quote
  #6 (permalink)  
Old 12-06-2004, 09:32 PM
Mark Smith
Guest
 
Posts: n/a
Default Re: making an fpga hot

Ahhh, that explains the issues with the ANT then... ;-)

Mark

Reply With Quote
  #7 (permalink)  
Old 12-08-2004, 05:44 AM
Paul Leventis \(at home\)
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Colin,

Below I try to give some insight into how to make a hot design, though I do
question the motivation of doing so. A simple FF chain comes no where close
to achieving a high (or even average) core power.

All of the phenomena I describe below are modeled in the recently released
Quartus II 4.2 software via its PowerPlay Power Analyzer. Target Stratix II
or Max II and you'll get very accurate estimates of how all these factors
affect your power consumption. You can try out the Power Analyzer in the
Quartus II 4.2 Web Edition software available from www.altera.com.

If you're trying to figure out if a given design will work on your board
after it's been made, the best bet is to try the chip out in the lab using
stimulus (vectors) that reflect the worst-case operating conditions for the
chip. I can make you a design that will burn many many Watts of power, but
that doesn't mean your design will. A dynamic power measurement from the
lab is the most accurate estimate possible -- just remember to use the
manufacturer's spec for worst-case static power (at worst-case temperature)
since the unit you have on your board is likely NOT worst-case.

> The estimator is just that, but is there a more accurate way of
> writing some code so that a particular clock input will generate a
> particular amount of heat. A 2000 D type serial chain where every flip
> flop is toggling every clock which blinks an LED is obviously one way
> but doesn't seem very ellegant.


There are many factors that affect overall dynamic power consumption of an
FPGA design. I will highlight a few critical ones below, and make
suggestions along the way to build a design to turn your FPGA into the
hot-plate you desire. It is *not* as simple as making one big
shift-register...

(0) Transition Density. You want to toggle as much every cycle as possible.
Toggle FF/shift register achieve this, as do XOR functions (if you want to
utilize the LUT too).

(1) Routing Utilization. The routing buffers, multiplexers, and wiring in
an FPGA can add up to a large amount of switching capacitance and
short-circuit (crowbar) current. To maximize dynamic power, you must use a
lot of routing. A simple FF chain will actually use very little routing,
unless you purposely make the placement very bad by using region constraints
such as LogicLock regions. You could, for example, constrain the even bits
of your chain to one-half the chip and the odd bits to the other half, and
this will greatly increase routing utilization. Or use something other than
FFs to increase the number and fanout of the routed wires. Of course,
you'll need to experiment a little to find the right balance between high
utilization and still being able to route!

(2) LUT Configuration. A LUT configured as an AND gate does not burn nearly
as much power as one configured as an XOR. This difference is due to the
number of internal nodes in the circuit that toggle states upon the toggle
of in input signal. On top of this, the output of an XOR will toggle upon
the toggle of any input -- so chaining together XORs will result in a
cascade of glitching (if there are no pipeline registers), which can further
increase your power. To get the most accurate estimate of LUT power, you
must consider the functionality of the LUT -- Quartus II can do this for
you.

(3) Clock Network. The vast majority of power on a high-fanout clock will
be burned *inside* the LABs (on the LAB-wide clock), not on the global clock
network. If you distribute a clock such that it fans out to one FF (out of
16) in every LAB of the device, this will maximize this internal LAB clock
network power. You can achieve this through location constraints applied to
these FFs. And the more clocks you use, the more you will burn. You can
use the PLLs to step up the clock frequency to help increase the toggle
rate.

(4) RAMs. A RAM can burn significant power if you perform reads & writes
every cycle (keep the clock enable asserted). Just hook up all the RAMs in
the device to be in dual-port mode writing & reading random data every
cycle, and you've got some more power.

(5) I/Os. You can burn an arbitrary amount of power with your I/Os,
depending on external termination resistance, contention, I/O standard,
drive strength, load capacitance, etc. Let's just pretend you don't have
I/Os to make life easier.

Hopefully that gives you some ideas of where to go to burn some power. If
your using a Xilinx chip, I'm sure similar techniques will apply, though
their tools may not be able to fully predict the results you will see.

Regards,

Paul Leventis
Altera Corp.


Reply With Quote
  #8 (permalink)  
Old 12-08-2004, 09:33 AM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Paul,
Comments/Questions below!

"Paul Leventis (at home)" <[email protected]> wrote in message
news:[email protected]..
> (2) LUT Configuration. A LUT configured as an AND gate does not burn
> nearly
> as much power as one configured as an XOR. This difference is due to the
> number of internal nodes in the circuit that toggle states upon the toggle
> of in input signal. On top of this, (blah, blah, XORs transition more)


Could you explain that a little more? I thought that the LUT was just a 16x1
RAM. Is the extra power consumed only when two inputs change? e.g. 00 => 11
into the XOR would still have 0 as its output but it might transistion
through the 1 output state? I understand that XOR gates are more likely to
transition, but you seem to be saying there's some additional internal
reason why they consume power.

>
> Paul Leventis
> Altera Corp.
>

Cheers, Syms.


Reply With Quote
  #9 (permalink)  
Old 12-09-2004, 12:02 AM
Ray Andraka
Guest
 
Posts: n/a
Default Re: making an fpga hot

The logic transitions in the routing and subsequent differential delays through
the LUT can make for many more transitions than a simple buffer implemented in a
LUT. Unless all the LUT inputs are precisely timed so that the edges change
together, you wind up with a walk through several of the LUT addresses in the
process of settling to the next clock. A paper presented at FPGA a few years
ago went as far as to say that as much as 30-40% of the power in a typical fpga
design is due to propagating glitches in the logic between flip-flops, and they
showed that by heavily pipelining the design, the power consumption improved
dramatically.

Symon wrote:

> Hi Paul,
> Comments/Questions below!
>
> "Paul Leventis (at home)" <[email protected]> wrote in message
> news:[email protected]..
> > (2) LUT Configuration. A LUT configured as an AND gate does not burn
> > nearly
> > as much power as one configured as an XOR. This difference is due to the
> > number of internal nodes in the circuit that toggle states upon the toggle
> > of in input signal. On top of this, (blah, blah, XORs transition more)

>
> Could you explain that a little more? I thought that the LUT was just a 16x1
> RAM. Is the extra power consumed only when two inputs change? e.g. 00 => 11
> into the XOR would still have 0 as its output but it might transistion
> through the 1 output state? I understand that XOR gates are more likely to
> transition, but you seem to be saying there's some additional internal
> reason why they consume power.
>
> >
> > Paul Leventis
> > Altera Corp.
> >

> Cheers, Syms.


--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email [email protected]
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759


Reply With Quote
  #10 (permalink)  
Old 12-09-2004, 01:58 AM
Paul Leventis \(at home\)
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Symon,

> > (2) LUT Configuration. A LUT configured as an AND gate does not burn
> > nearly
> > as much power as one configured as an XOR. This difference is due to

the
> > number of internal nodes in the circuit that toggle states upon the

toggle
> > of in input signal. On top of this, (blah, blah, XORs transition more)

>
> Could you explain that a little more? I thought that the LUT was just a

16x1
> RAM. Is the extra power consumed only when two inputs change? e.g. 00 =>

11
> into the XOR would still have 0 as its output but it might transistion
> through the 1 output state? I understand that XOR gates are more likely to
> transition, but you seem to be saying there's some additional internal
> reason why they consume power.


While logically a LUT is just 16x1 ROM, physically it is not built the same
way as a RAM.

A traditional RAM is built with a 2D-array of bits, where a row is selected
by decoding the address, and a pair of differential bit lines per cell is
precharged and then the cell pulls one side down which is amplified by a
sense-amplifier to speed things up (gross simplification). In that
structure, regardless of what you are reading, you burn the same power since
the reads are differential, and you burn power on each read, regardless of
the previously read value, since all that precharge, pull-down and sensing
happens every read.

A LUT however is traditionally built as a multiplexor tree. You have 16
SRAM cells feeding a tree of 2:1 muxes. The 4 inputs of the LUT each
control one level of the tree. There is a diagram below for a 2-LUT.

Let's take a 2-LUT implementing an XOR as an example (see diagram). We have
x = A?1:0 and y = A?0:1, and f = B?y:x. Let's say A switches from 0-->1
(and B = 0). Node x toggles from a 0 to 1. Node y toggles from a 1 to a 0.
And node f toggles from a 0 to a 1 (with x). So you have not only the
output of the LUT toggling, but also the internal stages. If you extend the
example to an N-LUT, you'll see that a toggle on input A results in 2^(N-1)
first stage nodes toggling, 2^(N-2) second stage, etc. or 2^N - 1 nodes
toggling *internal* to the LUT. If you look at an AND instead, you'll see
that only one first stage node toggles state with a change in A.

A B
+-+ | |
|0|-|\ x |
+++ | |__ |
+-+ | | |\
|1|-|/ | |
+++ | | |__ f
+-+ | | |
|1|-|\ y| |
+++ | |__|/
+-+ | |
|0|-|/
+++

So in conclusion, an XOR not only results in a higher output switching
probability (which should be modeled by your simulation vectors or assumed
toggle rate), but also results in higher *internal* switching activity.
Hence power of a LUT is not constant in LUT mask. In fact, it also changes
as a function of what the "static probabilities" of each input are, or % of
the time those inputs are 1 or 0, since assymetric LUT masks result in
assymetric internal states as a function of input values.

Regards,

Paul Leventis
Altera Corp.









Reply With Quote
  #11 (permalink)  
Old 12-09-2004, 02:09 AM
Paul Leventis \(at home\)
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Ray et al:

Good point on glitching. On a related note, this glitching also makes power
analysis difficult. Even with good-quality simulation vectors for a design,
the resulting gate-level simulation results will contain glitches. Are the
glitches real? If so, then they should count towards power. But
sufficiently short glitches will never propagate through the routing, or
even through the gate.

This is why we recommend that our users employ glitch filtering on
simulation results. This can be done with the Quartus II 4.2 simulator or
with 3rd party simulators (via the control file emitted by Quartus II). We
find that very glitchy designs do not correlate well unless this glitch
filtering is used. In addition, the resulting VCD files produced by 3rd
party sims need to be further filtered by Quartus in order to improve
accuracy further.

For further information on power analysis, the Quartus II PowerPlay Power
Analyzer and glitch filtering specifically, please see
http://www.altera.com/literature/hb/...s_qii53013.pdf.

And yes, pipelining is an excellent way to reduce glitching and thus dynamic
power. At some point, the pipeline registers and additional clock routing
will add more power than the glitches removed, but for glitch-heavy designs
(anything with XORs, such as adders, multipliers, and parity trees, and
"randomizing" circuits such as encryption) pipeling will help a lot.

Regards,

Paul Leventis
Altera Corp.


Reply With Quote
  #12 (permalink)  
Old 12-09-2004, 09:30 AM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hmm, that's very interesting. I wonder if the FPGA vendors have got their
SLICEs back to front? I.e. the FFs should feed directly into the LUTs within
the SLICEs, instead of the other way round that exists now. If it saved even
20% of the power, it'd be worth it. Instead of using all the FFs for
pipelining, you use them to replicate signals within the SLICEs to prevent
the glitchy power thing. Hmm, interesting indeed! Thanks Ray.
Cheers, Syms.
"Ray Andraka" <[email protected]> wrote in message
news:[email protected]..
> The logic transitions in the routing and subsequent differential delays
> through
> the LUT can make for many more transitions than a simple buffer
> implemented in a
> LUT. Unless all the LUT inputs are precisely timed so that the edges
> change
> together, you wind up with a walk through several of the LUT addresses in
> the
> process of settling to the next clock. A paper presented at FPGA a few
> years
> ago went as far as to say that as much as 30-40% of the power in a typical
> fpga
> design is due to propagating glitches in the logic between flip-flops, and
> they
> showed that by heavily pipelining the design, the power consumption
> improved
> dramatically.
>



Reply With Quote
  #13 (permalink)  
Old 12-09-2004, 05:29 PM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Paul,
That's interesting too! I think what you're saying is that some inputs to
the LUT are more power thirsty than others. So, in your example, the A input
in your example controls more muxes than the B input. This means that you
could reduce power by taking this into account. If you had a LUT structure
with four inputs A, B, C, D then A would feed 8 muxes, B feeds 4, C feeds 2,
and D feeds just one. For any two input function, only two inputs are used
and the P & R tools could prefer to use the C and D inputs for the least
amount of internal switching of nodes. Also, the net that changes most
frequently should be on the D input. Correct?
Thanks, Syms.

"Paul Leventis (at home)" <[email protected]> wrote in message
news:[email protected]..
>
> Let's take a 2-LUT implementing an XOR as an example (see diagram). We
> have
> x = A?1:0 and y = A?0:1, and f = B?y:x. Let's say A switches from 0-->1
> (and B = 0). Node x toggles from a 0 to 1. Node y toggles from a 1 to a
> 0.
> And node f toggles from a 0 to a 1 (with x). So you have not only the
> output of the LUT toggling, but also the internal stages. If you extend
> the
> example to an N-LUT, you'll see that a toggle on input A results in
> 2^(N-1)
> first stage nodes toggling, 2^(N-2) second stage, etc. or 2^N - 1 nodes
> toggling *internal* to the LUT. If you look at an AND instead, you'll see
> that only one first stage node toggles state with a change in A.
>
> A B
> +-+ | |
> |0|-|\ x |
> +++ | |__ |
> +-+ | | |\
> |1|-|/ | |
> +++ | | |__ f
> +-+ | | |
> |1|-|\ y| |
> +++ | |__|/
> +-+ | |
> |0|-|/
> +++



Reply With Quote
  #14 (permalink)  
Old 12-09-2004, 06:56 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: making an fpga hot



Paul Leventis (at home) wrote:
(snip regarding power, XOR trees, and FPGAs)

> While logically a LUT is just 16x1 ROM, physically it is not built the same
> way as a RAM.


> A traditional RAM is built with a 2D-array of bits, where a row is selected
> by decoding the address, and a pair of differential bit lines per cell is
> precharged and then the cell pulls one side down which is amplified by a
> sense-amplifier to speed things up (gross simplification). In that
> structure, regardless of what you are reading, you burn the same power since
> the reads are differential, and you burn power on each read, regardless of
> the previously read value, since all that precharge, pull-down and sensing
> happens every read.


That sounds more like a DRAM or SDRAM. Traditional SRAMs were
completely combinatorial, such that the output changed the appropriate
propagation delay after the address changed. Wouldn't the precharging
require a clock? I would have thought a 2D array, where a row is
decoded, the outputs from the selected row, either differential or not
are supplied to a mutliplexer to select the appropriate bits to output.
At 16 cells the advantage of 2D decoding might not be worthwhile.

> A LUT however is traditionally built as a multiplexor tree. You have 16
> SRAM cells feeding a tree of 2:1 muxes. The 4 inputs of the LUT each
> control one level of the tree. There is a diagram below for a 2-LUT.


I wonder how 16 bit SRAMs were built? As far as I understand it, the
first semiconductor memory for a commercial computer was the storage
protection keys for the IBM 360/91, built out if 16 bit SRAM chips.

-- glen

Reply With Quote
  #15 (permalink)  
Old 12-21-2004, 03:14 AM
Tim
Guest
 
Posts: n/a
Default Re: making an fpga hot

As I understand it (!) Stephen Trimberger (Xilinx and much
distinguished previous work) presented a paper recently on
this fairly recently.

"Symon" <[email protected]> wrote in message
news:[email protected]..
> Hmm, that's very interesting. I wonder if the FPGA vendors have got their
> SLICEs back to front? I.e. the FFs should feed directly into the LUTs within
> the SLICEs, instead of the other way round that exists now. If it saved even
> 20% of the power, it'd be worth it. Instead of using all the FFs for
> pipelining, you use them to replicate signals within the SLICEs to prevent the
> glitchy power thing. Hmm, interesting indeed! Thanks Ray.
> Cheers, Syms.



Reply With Quote
  #16 (permalink)  
Old 12-21-2004, 03:47 AM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: making an fpga hot



Symon wrote:

> Hmm, that's very interesting. I wonder if the FPGA vendors have got their
> SLICEs back to front? I.e. the FFs should feed directly into the LUTs within
> the SLICEs, instead of the other way round that exists now. If it saved even
> 20% of the power, it'd be worth it. Instead of using all the FFs for
> pipelining, you use them to replicate signals within the SLICEs to prevent
> the glitchy power thing. Hmm, interesting indeed! Thanks Ray.


You mean put four FF's on the LUT inputs, instead of one on the
output? I suppose that reduces glitching inside the LUT (RAM),
but it still leaves glitches through the routing. Also, four
FF's are likely to take more power than one.

-- glen

Reply With Quote
  #17 (permalink)  
Old 12-21-2004, 11:23 AM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Glen,
I'm being dense; why would it leave glitches through the routing? Once you
cascade LUTs without the FFs it could, but the FF fed LUT's output, and
hence the routing it subsequently feeds, should be glitch free. What am I
missing?
Cheers, Syms.
"glen herrmannsfeldt" <[email protected]> wrote in message
news:cq82s3$6n3$[email protected]..
>
> You mean put four FF's on the LUT inputs, instead of one on the
> output? I suppose that reduces glitching inside the LUT (RAM),
> but it still leaves glitches through the routing. Also, four
> FF's are likely to take more power than one.
>
> -- glen
>



Reply With Quote
  #18 (permalink)  
Old 12-21-2004, 11:32 AM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Tim,
Thanks for the heads up. Googleing Mr. Trimberger's name, I found this EE
times article:-

http://www.eet.com/semi/news/showArt...900516&kc=2515
quote>

Xilinx has already taken the first steps to raise the awareness of power
issues by disclosing a study on the hot spots in its latest Virtex 2
architecture. In the paper, the company showed that 60 percent of the power
consumption in the Virtex 2 family is from routing while logic and clocking
account for 16 and 14 percent, respectively. Additionally, Xilinx found that
the cluster of LUTs, flip-flops and other circuitry that make up its
configurable-logic blocks take up 5.9 microwatts per MHz for a typical
design. But this is just for "typical" designs; actual power consumption
within the configurable logic blocks (CLBs) can change wildly depending on
the switching activity. This can occur frequently in synchronous circuits,
where the inputs to the LUTs come in at different times during the same
clock cycle. This "glitching" effect could contribute up to 70 percent of
the power dissipation in a CMOS circuit, whether it's an ASIC or FPGA.

<quote

Cheers, Syms.

"Tim" <[email protected]> wrote in message
news:cq80ut$681$1$[email protected]..
> As I understand it (!) Stephen Trimberger (Xilinx and much
> distinguished previous work) presented a paper recently on
> this fairly recently.
>
> "Symon" <[email protected]> wrote in message
> news:[email protected]..
>> Hmm, that's very interesting. I wonder if the FPGA vendors have got their
>> SLICEs back to front? I.e. the FFs should feed directly into the LUTs
>> within the SLICEs, instead of the other way round that exists now. If it
>> saved even 20% of the power, it'd be worth it. Instead of using all the
>> FFs for pipelining, you use them to replicate signals within the SLICEs
>> to prevent the glitchy power thing. Hmm, interesting indeed! Thanks Ray.
>> Cheers, Syms.

>
>



Reply With Quote
  #19 (permalink)  
Old 12-21-2004, 07:41 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: making an fpga hot

I wrote:

>>You mean put four FF's on the LUT inputs, instead of one on the
>>output? I suppose that reduces glitching inside the LUT (RAM),
>>but it still leaves glitches through the routing. Also, four
>>FF's are likely to take more power than one.


Symon wrote:

> I'm being dense; why would it leave glitches through the routing? Once you
> cascade LUTs without the FFs it could, but the FF fed LUT's output, and
> hence the routing it subsequently feeds, should be glitch free. What am I
> missing?


I didn't try to figure all possibilities, but it would be a rare
design that used a FF on each LUT output, so I would expect some
LUT without FF's on the inputs. The arrival time will be different
for the different inputs, so there may (depending on logic) still be
glitches left.

I do agree, though, that for many designs it could greately reduce
glitches propagating through logic. I have done designs with at most
two LUT between FF's, highly pipelined for high speed.

I do agree that it could be an interesting addition to FPGA
architecture, and you might want to patent it. (If you do,
it will probably never get into any FPGA's though.)

-- glen

Reply With Quote
  #20 (permalink)  
Old 12-22-2004, 05:17 AM
Paul Leventis \(at home\)
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Symon,

> > Hmm, that's very interesting. I wonder if the FPGA vendors have got

their
> > SLICEs back to front? I.e. the FFs should feed directly into the LUTs

within
> > the SLICEs, instead of the other way round that exists now. If it saved

even
> > 20% of the power, it'd be worth it. Instead of using all the FFs for
> > pipelining, you use them to replicate signals within the SLICEs to

prevent the
> > glitchy power thing. Hmm, interesting indeed! Thanks Ray.


You'd have to consider the cost of having 4 flops (if I understand
correctly) vs. 1. How often will 4 flops be used? What if you instead
spent that same silicon area on other things (other power reduction
circuitry, etc.)? How much more wiring cap will there be due to increased
size of a LE? How much more power are you burning by replicating clocks and
other signals?

One thing I should point is that you *can* put FF in front of the LUT in
Stratix/Cyclone/Max II/Cyclone II/Stratix II. There is only one FF, but it
can directly feed the LUT instead of the other way around.

Regards,

Paul Leventis
Altera Corp.


Reply With Quote
  #21 (permalink)  
Old 12-23-2004, 12:45 PM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

Hi Paul,
Firstly, I apologise if my nomenclature is slightly Xilinx biased, I don't
get much chance to use your equally excellent Altera parts in my current
work.
So, I think I had it in my head originally that _all_ the 8 FFs in a slice
could be chosen from to drive _any_ of the LUTs in that slice, such that the
delays from FF to LUT were evenly matched. At present this relies on routing
outside the slice, and so the delays would be badly characterised. Then I
thought it might be possible to up the number of FFs to (say) 16 in a slice
to make this more viable. This gives you a 2:1 FF to LUT ratio. But, you
need a big switching thingy to get the FFs to the LUTs. Some kind of subset
might be fine though. Also, maybe you only need 2 registered inputs per LUT
to get a big saving in glitch energy.
In thinking this, I assumed that the LUT takes up much more silicon area
than the FF, after all the LUT has 16 bits, plus all that address muxing.
(Indeed, it's the switching of all that LUT silicon that we want to reduce.)
Is that a valid assumption? So, it doesn't make that big a difference to the
LE area (see, I know a bit of Alteraese!).
In the end we're trading switching a load extra FFs, against saving the
glitches in the LUTs.
Finally, what 'other power reduction circuitry' are you thinking of? Or is
it secret? ;-)
Thanks, and Cheers, Syms.
p.s. Do you have any comment on my post on the 9th Dec about whether certain
LUT inputs are more thirsty than others?

"Paul Leventis (at home)" <[email protected]> wrote in message
news:[email protected]..
> Hi Symon,
>
>> > Hmm, that's very interesting. I wonder if the FPGA vendors have got

> their
>> > SLICEs back to front? I.e. the FFs should feed directly into the LUTs

> within
>> > the SLICEs, instead of the other way round that exists now. If it saved

> even
>> > 20% of the power, it'd be worth it. Instead of using all the FFs for
>> > pipelining, you use them to replicate signals within the SLICEs to

> prevent the
>> > glitchy power thing. Hmm, interesting indeed! Thanks Ray.

>
> You'd have to consider the cost of having 4 flops (if I understand
> correctly) vs. 1. How often will 4 flops be used? What if you instead
> spent that same silicon area on other things (other power reduction
> circuitry, etc.)? How much more wiring cap will there be due to increased
> size of a LE? How much more power are you burning by replicating clocks
> and
> other signals?
>
> One thing I should point is that you *can* put FF in front of the LUT in
> Stratix/Cyclone/Max II/Cyclone II/Stratix II. There is only one FF, but
> it
> can directly feed the LUT instead of the other way around.
>
> Regards,
>
> Paul Leventis
> Altera Corp.
>
>



Reply With Quote
  #22 (permalink)  
Old 12-23-2004, 05:26 PM
Paul Leventis
Guest
 
Posts: n/a
Default Re: making an fpga hot

> So, I think I had it in my head originally that _all_ the 8 FFs in a
slice
> could be chosen from to drive _any_ of the LUTs in that slice, such

that the
> delays from FF to LUT were evenly matched. At present this relies on

routing
> outside the slice, and so the delays would be badly characterised.

Then I
> thought it might be possible to up the number of FFs to (say) 16 in a

slice
> to make this more viable. This gives you a 2:1 FF to LUT ratio. But,

you
> need a big switching thingy to get the FFs to the LUTs. Some kind of

subset
> might be fine though. Also, maybe you only need 2 registered inputs

per LUT
> to get a big saving in glitch energy.


I *think* you mean a CLB everywhere you say a slice (a slice is 2 LUTs
+ 2 FFs + Goo, I believe). I am not too familiar with Xilinx's
interslice routing. However, in our products you can get from a FF of
one Logic Element (FF + LUT pair) to any other LE/LUT in the same Logic
Array Block or LAB (a set of 8/10/16 LEs). The delay from any flop to
any LUT in the same LAB is very similar -- for the purposes of power &
glitching you could consider these paths to be matched.

Now adding additional FFs... FFs are area hungry (and power hungry...),
and it is rare for a design to use all (or even half) the FFs that are
already in our parts (with a 1:1 LUT/FF ratio). So these additional
FFs would be wasteful of area, and you'd have to ask whether that area
was better spent in other ways, or not at all (thus reducing Si cost).


> In thinking this, I assumed that the LUT takes up much more silicon

area
> than the FF, after all the LUT has 16 bits, plus all that address

muxing.
> (Indeed, it's the switching of all that LUT silicon that we want to

reduce.)
> Is that a valid assumption? So, it doesn't make that big a difference

to the
> LE area (see, I know a bit of Alteraese!).


Once you add in all the goo that comes with a FF (sync clear, asynch
clear, clock selection, sync load, etc.) they become surprisingly
large. But the second (or more FFs) would not need to be fully
featured and so I'll grant you that they wouldn't be huge. But you'd
be surprised at the lengths we go to cut even 1% area out of the LE.

> Finally, what 'other power reduction circuitry' are you thinking of?

Or is
> it secret? ;-)


Which ones am I thinking of? That's secret :-) But you can do a
literature search on low-power design in FPGAs and ASICs and you'll see
there are oodles of ideas out there, some or all of which cause some
area bloat in exchange for better power.

> p.s. Do you have any comment on my post on the 9th Dec about whether

certain
> LUT inputs are more thirsty than others?


Sorry, my news server has been really flaky this month. I've had to
resort to using Google groups now. I'll take a look when I get a
chance.

Paul Leventis
Altera Corp.

Reply With Quote
  #23 (permalink)  
Old 12-23-2004, 06:40 PM
Symon
Guest
 
Posts: n/a
Default Re: making an fpga hot

"Paul Leventis" <[email protected]> wrote in message
news:[email protected] oups.com...
>
> I *think* you mean a CLB everywhere you say a slice (a slice is 2 LUTs
> + 2 FFs + Goo, I believe). I am not too familiar with Xilinx's
> interslice routing. However, in our products you can get from a FF of
> one Logic Element (FF + LUT pair) to any other LE/LUT in the same Logic
> Array Block or LAB (a set of 8/10/16 LEs). The delay from any flop to
> any LUT in the same LAB is very similar -- for the purposes of power &
> glitching you could consider these paths to be matched.

<snipped interesting stuff>
Yes, I did mean CLB, thanks for working that out!
Thanks for your comments, Syms.


Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
making OS calls from w/in the simulator [email protected] Verilog 5 07-27-2006 05:28 PM
Making XAPP134 synthesizable Simone Winkler FPGA 0 01-12-2004 09:44 PM
Making hard macros in Xilinx FPGA Editor Frank FPGA 1 09-17-2003 09:41 AM


All times are GMT +1. The time now is 02:47 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved