FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-07-2003, 07:17 PM
Fernando
Guest
 
Posts: n/a
Default FPGAs and DRAM bandwidth

How fast can you really get data in and out of an FPGA?
With current pin layouts it is possible to hook four (or maybe even
five) DDR memory DIMM modules to a single chip.

Let's say you can create memory controllers that run at 200MHz (as
claimed in an Xcell article), for a total bandwidth of
5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) *
(1byte/8bits)=
5*3.2GB/s=16GB/s

Assuming an application that needs more BW than this, does anyone know
a way around this bottleneck? Is this a physical limit with current
memory technology?

Fernando
Reply With Quote
  #2 (permalink)  
Old 11-08-2003, 05:39 AM
john jakson
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

[email protected] (Fernando) wrote in message news:<[email protected] com>...
> How fast can you really get data in and out of an FPGA?
> With current pin layouts it is possible to hook four (or maybe even
> five) DDR memory DIMM modules to a single chip.
>
> Let's say you can create memory controllers that run at 200MHz (as
> claimed in an Xcell article), for a total bandwidth of
> 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) *
> (1byte/8bits)=
> 5*3.2GB/s=16GB/s
>
> Assuming an application that needs more BW than this, does anyone know
> a way around this bottleneck? Is this a physical limit with current
> memory technology?
>
> Fernando


OTOH

If you want more bandwidth than DDR DRAM, you could go for RamBus,
RLDRAM or the other NetRam or whatever its called. The RLDRAM devices
separate the I/Os for pure bandwidth, no turning the bus or clock
around nonsense and reduce latency from 60-80ns range down to 20ns or
so, that is true RAS cycle.

Micron & Infineon do the RLDRAM, another group does the NetRam (Hynix,
Samsung maybe).

The RLDRAM can run the bus upto 400MHz, double pumped to 800MHz and
can use most every cycle to move data 2x and receive control 1x. It is
8 ways banked so every 2.5ns another true random access can start to
each bank once every 20ns. The architecture supports 8,16,32-36 bit
width IOs IIRC. Sizes are 256M now. I was quoted price about $20
something, cheap for the speed, but far steeper than PC ram. Data can
come out in 1,2 or 4 words per address. Think I got all that right.
Details on Micron.com. I was told there are Xilinx interfaces for
them, I got docs at Xilinx but haven't eaten them yet. They also have
interfaces for the RamBus & NetRam. AVNET (??) also has a dev board
with couple of RLDRAM parts on them connected to a Virtex2 part, but I
think these are the 1st gen RLDRAM parts which are 250MHz 25ns cycle
so the interface must work.

Anyway, I only wish my PC could use them, I'd willingly pay mucho $
for a mobo that would use them but that will never happen. I quite
fancy using one for FPGA cpu, only I could probably keep 8 nested cpus
busy 1 bank each since cpus will be far closer to 20ns cycle than
2.5ns. The interface would then be a mux-demux box on my side. The
total BW would far surpass any old P4, but the latency is the most
important thing for me.

Hope that helps

johnjakson_usa_com
Reply With Quote
  #3 (permalink)  
Old 11-08-2003, 06:16 PM
Fernando
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Lots of good points in your reply, here is why I think these
technologies don't apply to problem that requires large and fast
memory.

RLDRAM: very promising, but the densities do not seem to increase
significantly over time (500Mbits now ~ 64MB). To the best of my
knowledge, nobody is making DIMMS with these chips, so they're stuck
as cache or network memory.

RDRAM (RAMBUS): as you said, only the slowest parts can be used with
FPGAs because of the very high frequency of the serial protocol. The
current slowest RDRAMs run at 800 MHz, a forbidden range for FPGAs
(Xilinx guys, please jump in and correct me if I'm wrong)

Am I missing something? Are there any ASICs out there that interface
memory DIMMS and FPGAs? Is there any way to use the rocket I/Os to
communicate with memory chips? or maybe a completely different
solution to the memory bottleneck not mentioned here?


[email protected] (john jakson) wrote in message news:<[email protected] com>...
> [email protected] (Fernando) wrote in message news:<[email protected] com>...
> > How fast can you really get data in and out of an FPGA?
> > With current pin layouts it is possible to hook four (or maybe even
> > five) DDR memory DIMM modules to a single chip.
> >
> > Let's say you can create memory controllers that run at 200MHz (as
> > claimed in an Xcell article), for a total bandwidth of
> > 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) *
> > (1byte/8bits)=
> > 5*3.2GB/s=16GB/s
> >
> > Assuming an application that needs more BW than this, does anyone know
> > a way around this bottleneck? Is this a physical limit with current
> > memory technology?
> >
> > Fernando

>
> OTOH
>
> If you want more bandwidth than DDR DRAM, you could go for RamBus,
> RLDRAM or the other NetRam or whatever its called. The RLDRAM devices
> separate the I/Os for pure bandwidth, no turning the bus or clock
> around nonsense and reduce latency from 60-80ns range down to 20ns or
> so, that is true RAS cycle.
>
> Micron & Infineon do the RLDRAM, another group does the NetRam (Hynix,
> Samsung maybe).
>
> The RLDRAM can run the bus upto 400MHz, double pumped to 800MHz and
> can use most every cycle to move data 2x and receive control 1x. It is
> 8 ways banked so every 2.5ns another true random access can start to
> each bank once every 20ns. The architecture supports 8,16,32-36 bit
> width IOs IIRC. Sizes are 256M now. I was quoted price about $20
> something, cheap for the speed, but far steeper than PC ram. Data can
> come out in 1,2 or 4 words per address. Think I got all that right.
> Details on Micron.com. I was told there are Xilinx interfaces for
> them, I got docs at Xilinx but haven't eaten them yet. They also have
> interfaces for the RamBus & NetRam. AVNET (??) also has a dev board
> with couple of RLDRAM parts on them connected to a Virtex2 part, but I
> think these are the 1st gen RLDRAM parts which are 250MHz 25ns cycle
> so the interface must work.
>
> Anyway, I only wish my PC could use them, I'd willingly pay mucho $
> for a mobo that would use them but that will never happen. I quite
> fancy using one for FPGA cpu, only I could probably keep 8 nested cpus
> busy 1 bank each since cpus will be far closer to 20ns cycle than
> 2.5ns. The interface would then be a mux-demux box on my side. The
> total BW would far surpass any old P4, but the latency is the most
> important thing for me.
>
> Hope that helps
>
> johnjakson_usa_com

Reply With Quote
  #4 (permalink)  
Old 11-08-2003, 06:46 PM
Phil Hays
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Fernando wrote:

> How fast can you really get data in and out of an FPGA?
> With current pin layouts it is possible to hook four (or maybe even
> five) DDR memory DIMM modules to a single chip.
>
> Let's say you can create memory controllers that run at 200MHz (as
> claimed in an Xcell article), for a total bandwidth of
> 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) *
> (1byte/8bits)=
> 5*3.2GB/s=16GB/s
>
> Assuming an application that needs more BW than this, does anyone know
> a way around this bottleneck? Is this a physical limit with current
> memory technology?


Probably can get a little better. With a 2V8000 in a FF1517 package,
there are 1,108 IOs. (!) If we shared address and control lines between
banks (timing is easier on these lines), it looks to me like 11 DIMMs
could be supported.

Data pins 64
DQS pins 8
CS,CAS,
RAS,addr 12 (with sharing)
====
92

1108/92 = 11 with 100 pins left over for VTH, VRP, VRN, clock, reset, ...

Of course, the communication to the outside world would also need go
somewhere...


--
Phil Hays
Reply With Quote
  #5 (permalink)  
Old 11-09-2003, 10:56 AM
Fernando
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Sharing the control pins is a good idea; the only thing that concerns
me is the PCB layout. This is not my area of expertise, but seems to
me that it would be pretty challenging to put (let's say) 10 DRAM
DIMMs and a big FPGA on a single board.

It can get even uglier if symmetric traces are required to each memory
sharing the control lines...(not sure if this is required)

Anyway, I'll start looking into it

Thanks

Phil Hays <[email protected]> wrote in message news:<[email protected]>...
> Fernando wrote:
>
> > How fast can you really get data in and out of an FPGA?
> > With current pin layouts it is possible to hook four (or maybe even
> > five) DDR memory DIMM modules to a single chip.
> >
> > Let's say you can create memory controllers that run at 200MHz (as
> > claimed in an Xcell article), for a total bandwidth of
> > 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) *
> > (1byte/8bits)=
> > 5*3.2GB/s=16GB/s
> >
> > Assuming an application that needs more BW than this, does anyone know
> > a way around this bottleneck? Is this a physical limit with current
> > memory technology?

>
> Probably can get a little better. With a 2V8000 in a FF1517 package,
> there are 1,108 IOs. (!) If we shared address and control lines between
> banks (timing is easier on these lines), it looks to me like 11 DIMMs
> could be supported.
>
> Data pins 64
> DQS pins 8
> CS,CAS,
> RAS,addr 12 (with sharing)
> ====
> 92
>
> 1108/92 = 11 with 100 pins left over for VTH, VRP, VRN, clock, reset, ...
>
> Of course, the communication to the outside world would also need go
> somewhere...

Reply With Quote
  #6 (permalink)  
Old 11-09-2003, 03:08 PM
Marc Randolph
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Fernando wrote:

> Phil Hays <[email protected]> wrote in message news:<[email protected]>...
>
>>Fernando wrote:
>>
>>
>>>How fast can you really get data in and out of an FPGA?
>>>With current pin layouts it is possible to hook four (or maybe even
>>>five) DDR memory DIMM modules to a single chip.
>>>
>>>Let's say you can create memory controllers that run at 200MHz (as
>>>claimed in an Xcell article), for a total bandwidth of
>>>5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) *
>>>(1byte/8bits)=
>>>5*3.2GB/s=16GB/s
>>>
>>>Assuming an application that needs more BW than this, does anyone know
>>>a way around this bottleneck? Is this a physical limit with current
>>>memory technology?

>>
>>Probably can get a little better. With a 2V8000 in a FF1517 package,
>>there are 1,108 IOs. (!) If we shared address and control lines between
>>banks (timing is easier on these lines), it looks to me like 11 DIMMs
>>could be supported.
>>
>>Data pins 64
>>DQS pins 8
>>CS,CAS,
>>RAS,addr 12 (with sharing)
>> ====
>> 92
>>
>>1108/92 = 11 with 100 pins left over for VTH, VRP, VRN, clock, reset, ...
>>
>>Of course, the communication to the outside world would also need go
>>somewhere...


Of course, the 2V8000 is REALLY expensive. I'm sure there is a pricing
sweat spot where it makes sense to break it up into multiple smaller
parts, providing both more pins and lower cost (something like two
2VP30's or 2VP40's [between the two: 1288 to 1608 I/Os, depending on
package]). They could be inter-connected using the internal SERDES.
The SERDES could also be used for communicating with the outside world.

> Sharing the control pins is a good idea; the only thing that concerns
> me is the PCB layout. This is not my area of expertise, but seems to
> me that it would be pretty challenging to put (let's say) 10 DRAM
> DIMMs and a big FPGA on a single board.


It may be challenging, but that is what is encountered when trying to
push the envelope, as it appears you are trying to do. This sometimes
entails accepting a bit less design margin to fulfill the requirements
in the alloted space or budget. Knowing what you can safely give up,
and where you can give it up, requires expertise (and so if you don't
have that expertise, you'll need to find someone that does).

If you are really set on meeting the memory requirements, you may need
to be open to something besides DIMM's (or perhaps make your own custom
DIMM's). A possible alternative: it looks like Toshiba is in the
process of releasing their 512 Mbit FCRAM. It supposedly provides 400
Mbps per data bit (using 200 MHz DDR... not a problem for modern FPGAs).

> It can get even uglier if symmetric traces are required to each memory
> sharing the control lines...(not sure if this is required)


I don't know what tools/budget you have available to you. Cadence
allows you to put a bus property on as many nets as you want. You can
then constrain all nets that form that bus to be within X% of each other
(in terms of length).

Good luck,

Marc

Reply With Quote
  #7 (permalink)  
Old 11-09-2003, 03:49 PM
Nicholas C. Weaver
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

In article <[email protected] >,
Fernando <[email protected]> wrote:
>Sharing the control pins is a good idea; the only thing that concerns
>me is the PCB layout. This is not my area of expertise, but seems to
>me that it would be pretty challenging to put (let's say) 10 DRAM
>DIMMs and a big FPGA on a single board.


Simple. Use external registers for the control lines, and drive 4
registers which then drive 4 DIMMs each. Adds a cycle of latency, but
so what?
--
Nicholas C. Weaver [email protected]
Reply With Quote
  #8 (permalink)  
Old 11-10-2003, 12:26 AM
Jeff Cunningham
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Fernando wrote:
> Sharing the control pins is a good idea; the only thing that concerns
> me is the PCB layout. This is not my area of expertise, but seems to
> me that it would be pretty challenging to put (let's say) 10 DRAM
> DIMMs and a big FPGA on a single board.


Don't forget simultaneous switching considerations. Driving 640 pins at
200 Mhz would probably require a bit of cleverness. Maybe you could run
different banks at different phases of the clock. Hopefully your app
does not need to write all DIMMs at once.

Jeff

Reply With Quote
  #9 (permalink)  
Old 11-10-2003, 02:29 AM
Robert Sefton
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Fernando -

Your instincts are right on with respect to the difficulty of fitting
that many DIMMs on a board and interfacing to them from a single FPGA.
Forget about it. The bottom line is that there's a trade-off between
memory size and speed, and memory is almost always the limiting factor
in system throughput. If you need lots of memory then DRAM is probably
your best/only option, and the max reasonable throughput is about what
you calculated, but even the 5-DIMM 320-bit-wide data bus in your
example would be a very tough PCB layout.

If you can partition your memory into smaller fast-path memory and
slower bulk memory, then on-chip memory is the fastest you'll find and
you can use SDRAM for the bulk. Another option, if you can tolerate
latency, is to spread the memory out to multiple PCBs/daughtercards,
each with a dedicated memory controller, and use multiple lanes of
extremely fast serial I/O between the master and slave memory
controllers.

A hierarchy of smaller/faster and larger/slower memories is a common
approach, e.g., on-chip core-rate L1 cache, off-chip fast L2 cache, and
slower bulk SDRAM in the case of microprocessors. If you tossed out some
specific system requirements here you'd probably get some good feedback
because this is a common dilemma.

Robert

"Fernando" <[email protected]> wrote in message
news:[email protected] om...
> Sharing the control pins is a good idea; the only thing that concerns
> me is the PCB layout. This is not my area of expertise, but seems to
> me that it would be pretty challenging to put (let's say) 10 DRAM
> DIMMs and a big FPGA on a single board.
>
> It can get even uglier if symmetric traces are required to each memory
> sharing the control lines...(not sure if this is required)
>
> Anyway, I'll start looking into it
>
> Thanks
>
> Phil Hays <[email protected]> wrote in message

news:<[email protected]>...
> > Fernando wrote:
> >
> > > How fast can you really get data in and out of an FPGA?
> > > With current pin layouts it is possible to hook four (or maybe

even
> > > five) DDR memory DIMM modules to a single chip.
> > >
> > > Let's say you can create memory controllers that run at 200MHz (as
> > > claimed in an Xcell article), for a total bandwidth of
> > > 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) *

(2words/cycle) *
> > > (1byte/8bits)=
> > > 5*3.2GB/s=16GB/s
> > >
> > > Assuming an application that needs more BW than this, does anyone

know
> > > a way around this bottleneck? Is this a physical limit with

current
> > > memory technology?

> >
> > Probably can get a little better. With a 2V8000 in a FF1517

package,
> > there are 1,108 IOs. (!) If we shared address and control lines

between
> > banks (timing is easier on these lines), it looks to me like 11

DIMMs
> > could be supported.
> >
> > Data pins 64
> > DQS pins 8
> > CS,CAS,
> > RAS,addr 12 (with sharing)
> > ====
> > 92
> >
> > 1108/92 = 11 with 100 pins left over for VTH, VRP, VRN, clock,

reset, ...
> >
> > Of course, the communication to the outside world would also need go
> > somewhere...



Reply With Quote
  #10 (permalink)  
Old 11-10-2003, 05:55 AM
Martin Euredjian
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

It would seem to me that the idea of using custom "serial dimms" combined
with Virtex II Pro high speed serial I/O capabilities might be the best way
to get a boost in data moving capabilities. This would avoid having to
drive hundreds of pins (and related issues) and would definetly simplify
board layout.

I haven't done the numbers. I'm just thinking out loud.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Martin Euredjian

To send private email:
[email protected]
where
"0_0_0_0_" = "martineu"



"Robert Sefton" <[email protected]> wrote in message
news:[email protected]
> Fernando -
>
> Your instincts are right on with respect to the difficulty of fitting
> that many DIMMs on a board and interfacing to them from a single FPGA.
> Forget about it. The bottom line is that there's a trade-off between
> memory size and speed, and memory is almost always the limiting factor
> in system throughput. If you need lots of memory then DRAM is probably
> your best/only option, and the max reasonable throughput is about what
> you calculated, but even the 5-DIMM 320-bit-wide data bus in your
> example would be a very tough PCB layout.
>
> If you can partition your memory into smaller fast-path memory and
> slower bulk memory, then on-chip memory is the fastest you'll find and
> you can use SDRAM for the bulk. Another option, if you can tolerate
> latency, is to spread the memory out to multiple PCBs/daughtercards,
> each with a dedicated memory controller, and use multiple lanes of
> extremely fast serial I/O between the master and slave memory
> controllers.
>
> A hierarchy of smaller/faster and larger/slower memories is a common
> approach, e.g., on-chip core-rate L1 cache, off-chip fast L2 cache, and
> slower bulk SDRAM in the case of microprocessors. If you tossed out some
> specific system requirements here you'd probably get some good feedback
> because this is a common dilemma.
>
> Robert
>
> "Fernando" <[email protected]> wrote in message
> news:[email protected] om...
> > Sharing the control pins is a good idea; the only thing that concerns
> > me is the PCB layout. This is not my area of expertise, but seems to
> > me that it would be pretty challenging to put (let's say) 10 DRAM
> > DIMMs and a big FPGA on a single board.
> >
> > It can get even uglier if symmetric traces are required to each memory
> > sharing the control lines...(not sure if this is required)
> >
> > Anyway, I'll start looking into it
> >
> > Thanks
> >
> > Phil Hays <[email protected]> wrote in message

> news:<[email protected]>...
> > > Fernando wrote:
> > >
> > > > How fast can you really get data in and out of an FPGA?
> > > > With current pin layouts it is possible to hook four (or maybe

> even
> > > > five) DDR memory DIMM modules to a single chip.
> > > >
> > > > Let's say you can create memory controllers that run at 200MHz (as
> > > > claimed in an Xcell article), for a total bandwidth of
> > > > 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) *

> (2words/cycle) *
> > > > (1byte/8bits)=
> > > > 5*3.2GB/s=16GB/s
> > > >
> > > > Assuming an application that needs more BW than this, does anyone

> know
> > > > a way around this bottleneck? Is this a physical limit with

> current
> > > > memory technology?
> > >
> > > Probably can get a little better. With a 2V8000 in a FF1517

> package,
> > > there are 1,108 IOs. (!) If we shared address and control lines

> between
> > > banks (timing is easier on these lines), it looks to me like 11

> DIMMs
> > > could be supported.
> > >
> > > Data pins 64
> > > DQS pins 8
> > > CS,CAS,
> > > RAS,addr 12 (with sharing)
> > > ====
> > > 92
> > >
> > > 1108/92 = 11 with 100 pins left over for VTH, VRP, VRN, clock,

> reset, ...
> > >
> > > Of course, the communication to the outside world would also need go
> > > somewhere...

>
>



Reply With Quote
  #11 (permalink)  
Old 11-10-2003, 04:53 PM
Fernando
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Thanks all for the good pointers.

As mentioned, the simultaneous switching will definitely be an issue.
I never heard about this technique of switching different DRAM chips
on different phases of the clock. Is it commonly used?

There was also a brief mention of "serial DIMMs". Has anyone seen
anything like that, or would I need to start from scratch?

The problem I'm looking operates on data sets ~16GB. Computation
takes about 0.5 sec, compared with ~2 sec required to download
(@100MHz). BRAM is used as cache for bandwidth.

The main memory bandwidth range I'm interested in would be ~50GB/s, so
the computation and memory-access times are comparable. That's why I'm
asking the experts, is this currently attainable with FPGAs?

Thanks again,

Fernando
Reply With Quote
  #12 (permalink)  
Old 11-10-2003, 05:03 PM
Nicholas C. Weaver
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

In article <[email protected] >,
Fernando <[email protected]> wrote:
>As mentioned, the simultaneous switching will definitely be an issue.
>I never heard about this technique of switching different DRAM chips
>on different phases of the clock. Is it commonly used?


I don't see why not, its a simple and elegant solution.

>The main memory bandwidth range I'm interested in would be ~50GB/s, so
>the computation and memory-access times are comparable. That's why I'm
>asking the experts, is this currently attainable with FPGAs?


Its pin bandwidth which is going to be needed, and lots of pins.

So lets take a 200 MHz DDR signaling, thats 400 Mbps/pin peak. Thus
50 GBs would take, at MINIMUM, 1000 pins!

1000 signaling pins, at 400 MHz, is not very happy. Probably barely
doable on the largest part, but not happy. Then there are all the
control pins.


Quesion: do you REALLY need all that memory bandwidth? Do you really
need all that speed? Or could you just make things take 10x longer,
only require 2 banks of DDR, and use a smaller piece of FPGA logic?
--
Nicholas C. Weaver [email protected]
Reply With Quote
  #13 (permalink)  
Old 11-10-2003, 06:20 PM
John_H
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

I asked a DRAM vendor about the possibility of serial DRAM. The comment was
that they weren't considering that direction because of poorer latency. I'm
not certain the arguement was valid, though, because the DDR design I was
trying to put together had to register in the IOBs and in the logic fabric;
if the buffering was reduced in the FPGA internals using the serial links,
the overall latency might be the same (or better) at a huge reduction in
pins.

The bandwidth is, however, still an issue. 64 bits wide, 400M/s -> 25.6
Gb/s. Oops!


"Martin Euredjian" <[email protected]> wrote in message
news:[email protected] m...
> It would seem to me that the idea of using custom "serial dimms" combined
> with Virtex II Pro high speed serial I/O capabilities might be the best

way
> to get a boost in data moving capabilities. This would avoid having to
> drive hundreds of pins (and related issues) and would definetly simplify
> board layout.
>
> I haven't done the numbers. I'm just thinking out loud.
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Martin Euredjian
>
> To send private email:
> [email protected]
> where
> "0_0_0_0_" = "martineu"
>
>
>
> "Robert Sefton" <[email protected]> wrote in message
> news:[email protected]
> > Fernando -
> >
> > Your instincts are right on with respect to the difficulty of fitting
> > that many DIMMs on a board and interfacing to them from a single FPGA.
> > Forget about it. The bottom line is that there's a trade-off between
> > memory size and speed, and memory is almost always the limiting factor
> > in system throughput. If you need lots of memory then DRAM is probably
> > your best/only option, and the max reasonable throughput is about what
> > you calculated, but even the 5-DIMM 320-bit-wide data bus in your
> > example would be a very tough PCB layout.
> >
> > If you can partition your memory into smaller fast-path memory and
> > slower bulk memory, then on-chip memory is the fastest you'll find and
> > you can use SDRAM for the bulk. Another option, if you can tolerate
> > latency, is to spread the memory out to multiple PCBs/daughtercards,
> > each with a dedicated memory controller, and use multiple lanes of
> > extremely fast serial I/O between the master and slave memory
> > controllers.
> >
> > A hierarchy of smaller/faster and larger/slower memories is a common
> > approach, e.g., on-chip core-rate L1 cache, off-chip fast L2 cache, and
> > slower bulk SDRAM in the case of microprocessors. If you tossed out some
> > specific system requirements here you'd probably get some good feedback
> > because this is a common dilemma.
> >
> > Robert
> >
> > "Fernando" <[email protected]> wrote in message
> > news:[email protected] om...
> > > Sharing the control pins is a good idea; the only thing that concerns
> > > me is the PCB layout. This is not my area of expertise, but seems to
> > > me that it would be pretty challenging to put (let's say) 10 DRAM
> > > DIMMs and a big FPGA on a single board.
> > >
> > > It can get even uglier if symmetric traces are required to each memory
> > > sharing the control lines...(not sure if this is required)
> > >
> > > Anyway, I'll start looking into it
> > >
> > > Thanks
> > >
> > > Phil Hays <[email protected]> wrote in message

> > news:<[email protected]>...
> > > > Fernando wrote:
> > > >
> > > > > How fast can you really get data in and out of an FPGA?
> > > > > With current pin layouts it is possible to hook four (or maybe

> > even
> > > > > five) DDR memory DIMM modules to a single chip.
> > > > >
> > > > > Let's say you can create memory controllers that run at 200MHz (as
> > > > > claimed in an Xcell article), for a total bandwidth of
> > > > > 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) *

> > (2words/cycle) *
> > > > > (1byte/8bits)=
> > > > > 5*3.2GB/s=16GB/s
> > > > >
> > > > > Assuming an application that needs more BW than this, does anyone

> > know
> > > > > a way around this bottleneck? Is this a physical limit with

> > current
> > > > > memory technology?
> > > >
> > > > Probably can get a little better. With a 2V8000 in a FF1517

> > package,
> > > > there are 1,108 IOs. (!) If we shared address and control lines

> > between
> > > > banks (timing is easier on these lines), it looks to me like 11

> > DIMMs
> > > > could be supported.
> > > >
> > > > Data pins 64
> > > > DQS pins 8
> > > > CS,CAS,
> > > > RAS,addr 12 (with sharing)
> > > > ====
> > > > 92
> > > >
> > > > 1108/92 = 11 with 100 pins left over for VTH, VRP, VRN, clock,

> > reset, ...
> > > >
> > > > Of course, the communication to the outside world would also need go
> > > > somewhere...

> >
> >

>
>



Reply With Quote
  #14 (permalink)  
Old 11-10-2003, 11:02 PM
Erez Birenzwig
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

"Nicholas C. Weaver" <[email protected]> wrote in message
news:[email protected]
> In article <[email protected] >,
> Fernando <[email protected]> wrote:
> >As mentioned, the simultaneous switching will definitely be an issue.
> >I never heard about this technique of switching different DRAM chips
> >on different phases of the clock. Is it commonly used?

>
> I don't see why not, its a simple and elegant solution.


Don't forget you're dealing with DDR DRAM, it already uses all the phases of
the clock.
From my experience running ~160 pins at 200MHz causes the FPGA to be very
hot
(~85C without a heat sink).

Erez.


Reply With Quote
  #15 (permalink)  
Old 11-10-2003, 11:13 PM
Nicholas C. Weaver
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

In article <TPUrb.50$%[email protected]>,
Erez Birenzwig <[email protected]> wrote:

>Don't forget you're dealing with DDR DRAM, it already uses all the phases of
>the clock.


It's using both phases of a single clock, but that clock can be out of
phase with other DRAM clocks, which seems to be the idea.

>From my experience running ~160 pins at 200MHz causes the FPGA to be
>very hot (~85C without a heat sink).


Not suprising.

But slap an Itanic heatsink on it! (IA64 heatsink can cool 130W).
--
Nicholas C. Weaver [email protected]
Reply With Quote
  #16 (permalink)  
Old 11-11-2003, 09:11 PM
Fernando
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

> Quesion: do you REALLY need all that memory bandwidth? Do you really
> need all that speed? Or could you just make things take 10x longer,
> only require 2 banks of DDR, and use a smaller piece of FPGA logic?


I *could* take 10x longer. I could use a pentium too.

-----------

For the ones interested in this thread, see

http://micron.com/news/product/2003-...ronDDR400.html

"Altera and Micron Announce Industry's First DDR400 SDRAM DIMM
Interface for FPGAs"

I don't see how that's "the first", but it's a good thing to have
multiple vendors to choose from.
Reply With Quote
  #17 (permalink)  
Old 11-11-2003, 11:29 PM
Erik Widding
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

"Fernando" <[email protected]> wrote...
> The problem I'm looking operates on data sets ~16GB. Computation
> takes about 0.5 sec, compared with ~2 sec required to download
> (@100MHz). BRAM is used as cache for bandwidth.


Have you considered compressing one or all of your data sets? If you can
compress the data sets, you will both reduce the bandwidth to the memory and
the size of the memory buffer.

We have seen vision applications where the data sets were binary templates,
and simple compression schemes offered a 10x improvement in memory
size/bandwidth, and only required a small decompressor to implement in the
FPGA. The engineering cost for developing the compression algorithm and
associated fpga logic was MUCH lower than the cost to layout and model a
multi-DIMM pcb.

Whether compression will help, all depends on what the data looks like.

Also, if you are passing over the same data multiple times, as you would if
you were using the same data to compute multiple results, it might benefit
you to compute multiple sets of results per pass of the data. Depending on
your problem, it might make sense to look at solution with the goal of
minimizing the dataflow, rather than simplifying the computation.

We have an engineering mantra here, "the pins are the most expensive part of
the fpga". This is not only due to the fact that the number of pins on a
die are proportional to the square root of the number of gates in a given
device family, but also due to the fact that lots of traces on a board add
to the cost of layout, verification, and fabrication.


Regards,
Erik Widding.

---
Birger Engineering, Inc. -------------------------------- 617.695.9233
100 Boylston St #1070; Boston, MA 02116 -------- http://www.birger.com


Reply With Quote
  #18 (permalink)  
Old 11-12-2003, 10:53 AM
Fernando
Guest
 
Posts: n/a
Default Re: FPGAs and DRAM bandwidth

Hi, thanks for your reply.

> Have you considered compressing one or all of your data sets?


I will use compression for the output stream (results).

For main memory I'm using floating-point, so I'd need lossless
compression. I've never implemented any compression modules myself,
but I don't think it can be done in realtime for the amount of data I
need. I will gladly accept opinions from the group on this.

> Also, if you are passing over the same data multiple times, as you would if
> you were using the same data to compute multiple results, it might benefit
> you to compute multiple sets of results per pass of the data.


I am already doing something like that. Of course, limited by the
amount of RAM inside the chip. There is also a fundamental time
dependence in my algorithm, but we got around that parallelizing each
step in between.

> We have an engineering mantra here, "the pins are the most expensive part of
> the fpga". This is not only due to the fact that the number of pins on a
> die are proportional to the square root of the number of gates in a given
> device family, but also due to the fact that lots of traces on a board add
> to the cost of layout, verification, and fabrication.


I completely agree, that's why I'm a little inclined towards serial
solutions, but seems like those are not ready yet...at least for
memory.
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
dram circuits Adnan Aziz Verilog 6 11-19-2004 02:45 AM
DRAM and EMC ricky Verilog 2 10-24-2004 02:36 AM
High Bandwidth Virtex II boards Steven Archibald FPGA 2 09-19-2003 12:25 AM


All times are GMT +1. The time now is 04:14 PM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved