FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 04-24-2006, 08:55 PM
Guest
 
Posts: n/a
Default Max and Argmax across 1,000 unsigned 10-bit numbers

Hello,

I recently requested advice as to performing bit-matrix multiplication
on bit matrices (bitwise AND followed by a population count), one
matrix stored onboard an fpga in block rams, the other one (first
operand) streaming in row by row.

It was thought that one of the latest pcie cards would be able to
provide dot-product throughput limited by the pcie input speed of 16
Gbps for pcie x8. The adders would be 80-bits wide (80 bits arrive per
cycle at 200 mhz over pcie x8) and each column of the onboard matrix
would be stored in 80 block rams.

My question is: How much more difficult is the problem if I must find
out the maximum dot-product, and which column produced it, for each
input vector? This operation must be performed for each input row,
yielding 1 max and argmax for every 1000 input bits. The input vector
is 1000 bits long, of course, and finishes arriving over pcie after
about 12 pcie cycles. Is there a fast enough way to argmax 1000
numbers that are 10-bits (representing each columns dot-product)? What
would the cost of the argmax operation be in fpga space as compared to
the column adders (which are probably 80-bits wide for each column)?

Thanks for your help. I want to make sure that the max and argmax
functions will not be a limiting factor in the design of the bit-matrix
multiplier.

Also, thanks for so many helpful comments that have gotten me to this
level of understanding of the problem.

- AndrewF

Reply With Quote
  #2 (permalink)  
Old 04-25-2006, 12:06 PM
Aurelian Lazarut
Guest
 
Posts: n/a
Default Re: Max and Argmax across 1,000 unsigned 10-bit numbers

[email protected] wrote:
> Hello,
>
> I recently requested advice as to performing bit-matrix multiplication
> on bit matrices (bitwise AND followed by a population count), one
> matrix stored onboard an fpga in block rams, the other one (first
> operand) streaming in row by row.
>
> It was thought that one of the latest pcie cards would be able to
> provide dot-product throughput limited by the pcie input speed of 16
> Gbps for pcie x8. The adders would be 80-bits wide (80 bits arrive per
> cycle at 200 mhz over pcie x8) and each column of the onboard matrix
> would be stored in 80 block rams.

Not sure from where did you get the PCIE express numbers, I was under
the impresion that a PCIE lane is running at 2.5Gbit/s, so if you divide
by 10 (10b/8b decoding) you'll have 250Mhz for a byte word.
(and x8 lanes will be 20 Gbit/s not 16 Gbit/s)
Aurash
>
> My question is: How much more difficult is the problem if I must find
> out the maximum dot-product, and which column produced it, for each
> input vector? This operation must be performed for each input row,
> yielding 1 max and argmax for every 1000 input bits. The input vector
> is 1000 bits long, of course, and finishes arriving over pcie after
> about 12 pcie cycles. Is there a fast enough way to argmax 1000
> numbers that are 10-bits (representing each columns dot-product)? What
> would the cost of the argmax operation be in fpga space as compared to
> the column adders (which are probably 80-bits wide for each column)?
>
> Thanks for your help. I want to make sure that the max and argmax
> functions will not be a limiting factor in the design of the bit-matrix
> multiplier.
>
> Also, thanks for so many helpful comments that have gotten me to this
> level of understanding of the problem.
>
> - AndrewF
>

Reply With Quote
  #3 (permalink)  
Old 04-25-2006, 03:21 PM
Guest
 
Posts: n/a
Default Re: Max and Argmax across 1,000 unsigned 10-bit numbers

OK, 20 Gbit/s, thanks. Any input on whether max and argmax across
1,000 10-bit numbers will take up much space on the fpga in comparison
to the space required for 1,000 100-bit adders? (I guess with 20
Gbit/sec they would be 100-bit and not 80-bit)

Thanks,
AndrewF

Reply With Quote
  #4 (permalink)  
Old 04-25-2006, 04:08 PM
Kolja Sulimma
Guest
 
Posts: n/a
Default Re: Max and Argmax across 1,000 unsigned 10-bit numbers

[email protected] schrieb:
> OK, 20 Gbit/s, thanks. Any input on whether max and argmax across
> 1,000 10-bit numbers will take up much space on the fpga in comparison
> to the space required for 1,000 100-bit adders? (I guess with 20
> Gbit/sec they would be 100-bit and not 80-bit)

As you produce less than one 10-bit number per clock cycle on average a
single 10-bit maximum gate is sufficient. That's 10 LUTs.
However, it depends on your algorithm wether you can skew the data
processing in a ways that the intermediate results are produce one
after. If you get them all at once, you also need 10000 Bits of storage.

Kolja Sulimma
Reply With Quote
  #5 (permalink)  
Old 04-25-2006, 06:46 PM
Guest
 
Posts: n/a
Default Re: Max and Argmax across 1,000 unsigned 10-bit numbers

Ahh, I see. Thanks very much. So it takes 10 LUT's to take the
maximum over two 10-bit numbers. If pipelined, it can do this at a
throughput of 1 per clock cycle? So ideally, performing 1,000 max
operations/10 cycles = 100 max operations per cycle, = 1,000 LUT's.

Thanks,
AndrewF

Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
is conv_integer(unsigned(value)) synthesizable [email protected] FPGA 2 03-20-2006 02:26 AM
Signed/unsigned divider vinch FPGA 2 06-08-2005 09:17 AM
std_logic_vector vs unsigned Chuck McManis FPGA 6 05-18-2004 03:07 PM
Question in adding signed Vs unsigned Rajat Mitra Verilog 1 09-18-2003 08:10 PM


All times are GMT +1. The time now is 04:57 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved