FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 04-10-2006, 03:25 PM
mikel
Guest
 
Posts: n/a
Default ROM resource sharing

Hello

How to implement on-chip ROM memory resource sharing in FPGA? I
implemented discrete cosine transform core using parallel distributed
arithmetic approach, in which hardware multipliers are substituted by
precomputed MAC results stored in LUT/ROM. Single ROM instance is 64x14
bits. Problem is that the ROM must be replicated many times to enable
high throughput (replicated 9 times for first DCT stage and replicated
11 times for 2nd stage after transposition). This ends up having more
than 25kbits of ROM memory in the core, which is pretty big. I know
there are dual port memories with dual read port capability, but this
will 'only' halve resources needed. Any better ideas?

Michal

Reply With Quote
  #2 (permalink)  
Old 04-10-2006, 04:46 PM
John_H
Guest
 
Posts: n/a
Default Re: ROM resource sharing

You have 20 different addresses for the 20 replications, correct?
Which FPGA family are you using?

"mikel" <[email protected]> wrote in message
news:[email protected] oups.com...
> Hello
>
> How to implement on-chip ROM memory resource sharing in FPGA? I
> implemented discrete cosine transform core using parallel distributed
> arithmetic approach, in which hardware multipliers are substituted by
> precomputed MAC results stored in LUT/ROM. Single ROM instance is 64x14
> bits. Problem is that the ROM must be replicated many times to enable
> high throughput (replicated 9 times for first DCT stage and replicated
> 11 times for 2nd stage after transposition). This ends up having more
> than 25kbits of ROM memory in the core, which is pretty big. I know
> there are dual port memories with dual read port capability, but this
> will 'only' halve resources needed. Any better ideas?
>
> Michal



Reply With Quote
  #3 (permalink)  
Old 04-10-2006, 09:51 PM
mikel
Guest
 
Posts: n/a
Default Re: ROM resource sharing

John,

Actually, the LUT/ROM is replicated twice as much as I said before (18
times 1st stage, 22 times 2nd stage). Synthesis tool was smart enough
to reduce size of ROMs memory bits from 35840 bits to 25600 bits (there
are few identical values inside every ROM, synthesis tool placed
additional decoding logic for input address to reduce memory size). But
this is still too much.

> You have 20 different addresses for the 20 replications, correct?


yes, I have different address for every ROM access, and I need to
access all ROMs at the same clock cycle for performance.

> Which FPGA family are you using?


I want design to be generic, though I ordered Virtex 2Pro board from
Digilent so this will be my target.

Michal K

Reply With Quote
  #4 (permalink)  
Old 04-10-2006, 10:22 PM
John_H
Guest
 
Posts: n/a
Default Re: ROM resource sharing

"mikel" <[email protected]> wrote in message
news:[email protected] oups.com...
> John,
>
> Actually, the LUT/ROM is replicated twice as much as I said before (18
> times 1st stage, 22 times 2nd stage). Synthesis tool was smart enough
> to reduce size of ROMs memory bits from 35840 bits to 25600 bits (there
> are few identical values inside every ROM, synthesis tool placed
> additional decoding logic for input address to reduce memory size). But
> this is still too much.
>
>> You have 20 different addresses for the 20 replications, correct?

>
> yes, I have different address for every ROM access, and I need to
> access all ROMs at the same clock cycle for performance.
>
>> Which FPGA family are you using?

>
> I want design to be generic, though I ordered Virtex 2Pro board from
> Digilent so this will be my target.
>
> Michal K


If you have 40 different 6-bit addresses for 40 different 64x14 ROMs, I
don't see how you can do better than 40instances*4LUTs*14bits = 2240 LUTs
(or 280 CLBs in your current architecture). Implementing each ROM with
fewer than 4 LUTs per bit would be possible for some 6-in-1 out functions.

Each ALM in the Stratix-II series (roughly equivalent but twice the LUT size
as a Xilinx slice) can provide a 64x1 ROM.

You could use a BlockRAM to provide 2 ports of 14 bits each (up to 36 bits
available) to displace 56 LUTs each. The 4.5 kbit Altera M4K blocks would
be more "efficent" since only 64 entries are needed in your application and
there are typically many more M4K blocks than BlockRAMs in equivalent A vs X
devices.

It's quite possible you could time-multiplex your 14-bit lookups at 2x, 3x,
even 4x your main design speed since the ROM lookup time as implemented in
distributed CLB SelectRAM is one LUT plus MUXF5 plus MUXF6, roughly less
than 2 levels of logic in a pipelined implementation.

The bottom line is that you have to pull out 40 unique 14-bit values. If
there is no convenient way to reduce the uniqueness, the replication has to
be there.

What does help is that each LUT or LE can give you 16 bits of ROM. Each ALM
can give you 64 bits of ROM.


Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sharing I-cache(only one level) between 2 processors vittal Verilog 0 04-18-2007 01:17 PM
sharing constants across modules/testbenches Jason Zheng Verilog 3 06-08-2006 09:42 AM
sharing a common resource... potential problems... CODE_IS_BUG FPGA 2 04-22-2005 10:40 AM
PPCs sharing an OCM BRAM Joseph FPGA 3 04-22-2005 01:55 AM
two modules sharing same clock -need help! thomasc Verilog 2 03-30-2005 12:48 AM


All times are GMT +1. The time now is 05:55 PM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved