FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 06-20-2007, 07:41 AM
[email protected]
Guest
 
Posts: n/a
Default Interesting problems about high performance computing

There is a software application which demonds huge computation. My aim
is to port the software program into hardware. So first of all, I need
to evaluate the required volumn of computation.

The situation is that the software program will need up to 22 hours on
personal computer (P4 3.06G, 2GB RAM) to finish the computation, while
the operations mostly are float-point multiplications and additions. I
need to finish the computation by hardware in serveral minites. Is it
possible?

Another problem is that which kind of hardware platforms is
perferable. I have checked that the production provided by
Nallatech(www.nallatech.com) seems suitable for me. I am still
wondering whether I should choose some boards with multiple DSP
processors on it? I think DSP processors are better at floating point
operations.

Reply With Quote
  #2 (permalink)  
Old 06-20-2007, 08:30 AM
Totally_Lost
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

On Jun 19, 11:41 pm, "h...@hit.edu.cn" <h...@hit.edu.cn> wrote:
> There is a software application which demonds huge computation. My aim
> is to port the software program into hardware. So first of all, I need
> to evaluate the required volumn of computation.


put some counters in the software version to total the required number
of operations on each variable in the calculation, and you can get
that answer fairly accurately.

Next take a good look at the algorithm, for concurrency and necessary
serial operations.

Any place you have a different data value used in each iteration a
loop, you have the chance to do calculate two or more values per loop
iterations (unrolling the loop) to gain parallelism. If however, there
are serializations which offer no restructuring to parallelize, then
the application might well be done best with a fast cpu/cache rather
than a slower FGPA. Look at Amdahl's law regarding possible speedup
between parallel and serial sections of a program.

In some cases, even a serial operation that is lengthy, may see some
gains by pipelining, as another form of parallelization. So be
constructive in looking for parallelism in the algorithm. In some
other cases, using a completely different algorithm which completes
the task equally, may be the only means for gaining the required
parallelism. This sometimes requires some thought, like replacing
recursion with linear code, to "unroll" the implied loops. or
reordering the sequence over operations ... such as loop combining
where one or more loops process the same data points.

> The situation is that the software program will need up to 22 hours on
> personal computer (P4 3.06G, 2GB RAM) to finish the computation, while
> the operations mostly are float-point multiplications and additions. I
> need to finish the computation by hardware in serveral minites. Is it
> possible?


In general, most FPGA implementations see speed improvements between
2X to 200X, sometimes worse, rarely much better, unless the main
software loop is very small, and the algorithm has exceptional
parallelism. The reason is that FPGA cycle times are relatively slow
compared to GHz multi-issue pipelined CPUs. It doesn't really mater
much if the data is fixed or floating point, what matters is the
degree of parallelism you can introduce and exploit in the FPGA. You
are looking for speed up's on the order of 22*60/3 = 440X which
rather implies the calculations of the algorithm must have a very
small kernel, and extremely easy to parallelize with little or no
serial component (such as memory accesses) to meet your goals in an
FPGA ... certainly possible with some careful work, but not
necessarily for all algorithms or problems.


> Another problem is that which kind of hardware platforms is
> perferable. I have checked that the production provided by
> Nallatech(www.nallatech.com) seems suitable for me. I am still
> wondering whether I should choose some boards with multiple DSP
> processors on it? I think DSP processors are better at floating point
> operations.


Beside looking at FPGAs, you might also consider GPUs, Cell
processors, and a different CPU architecture which has better multi-
issue floating point, cache, or memory performance ... including
tuning the application for exactly that CPU/Memory configuration.

Have fun!
John

Reply With Quote
  #3 (permalink)  
Old 06-20-2007, 11:03 AM
Christian Kirschenlohr
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

Hello
> Another problem is that which kind of hardware platforms is
> perferable. I have checked that the production provided by
> Nallatech(www.nallatech.com) seems suitable for me. I am still
> wondering whether I should choose some boards with multiple DSP
> processors on it? I think DSP processors are better at floating point
> operations.
>

perhaps, this one could suit your needs:
http://www.mathstar.com/products.html

Regards

Christian
Reply With Quote
  #4 (permalink)  
Old 06-20-2007, 03:45 PM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

Most engineers still don't realize the processing power of the FPGA. True,
today's processors operate at high clock frequencies. People don't truly
understand how much of that raw speed is lost to processor overhead.
I created a FPGA process for processing real time 1280pixel
32bit camera scans to identifiy the leading edge of incoming documents,
determine
the skew angle and rotate the image in realtime while the image was still
being scanned. This process originally required 8 High speed DSP's with
significant
propagation delay and large amounts of memory.
I have also converted many software processes to the FPGA enviroment.
Each have had dramatic improvements in speed that no processor could
ever come close to matching.

Here is a short list of problems your software program may be
experiencing.

1. Almost 25% to 50% your speed is lost to opcode and memory access.
This number varies depending on CPU cache efficiency.
2. Depending on the software program size and memory access. You
could be forcing excessive cache dumps and reload which can reduce
speed another 10 to 50%.
3. You must then content with program efficiency. Is the program
optimized
for speed.
4. Last, the efficiency of your software compiler. Is it using extensive
use of libraries or inline code.

When a program is converted to hardware you eliminate items 1, 2, and 4.
You
are left with program efficiency, how well it is translated to hardware.

The first thing to do is reorder the statements in the program. Section the
program
into stages and identify the loop/repitition structure. Each stage has a
dependency on the previous computation. This
will usually lead to each stage having multiple computations. These can be
executed in parallel in the FPGA and many equations can be executed in just
one
FPGA clock cycle. After reordering the statements in the
proper order for hardware conversion, recompile the program and run to
insure in still functions correctly. This is now your basic template for
conversion.

If you use large amounts of data in the program beyond the capacity of the
FPGA,
you will need to create a multi channel DMA controller w/cache.
This controller will provide access to each stage needing external memory.

Second, when using decimal calculation, determine the maximum decimal error.
Floating point offers many advantages but slows down computation, consume
large number of resources and add to the overall complexity of the hardware.
You
should use fix point computation if possible.

Fix point decimal accuracy (decimal portion not including integer size)

1 byte = 2 decimal places
2 byte = 4 decimal places
3 byte = 7 decimal places
4 byte = 9 decimal places
5 byte = 12 decimal places
6 byte = 14 decimal places

Xilinx devices have built in multipliers (18x18) which are fast and save
hardware.
Division is possible but uses more resources.

A spartan3 can do the work. The Virtex 2, and 4 are faster and offer more
memory and multipliers with the addition of CPU's to help with other tasks.
A DSP does provide avantages over a standard processor
but does not compare to the raw power of the FPGA.

[email protected]
Reply With Quote
  #5 (permalink)  
Old 06-20-2007, 06:12 PM
John Adair
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

The type of thing you are doing we have a number of clients doing
similar things with our development hardware in quite a few
application areas.

>From DSP performance it is hard for a DSP processor to beat a well

designed and implemented FPGA design. Generally you should see a
massive increase in performance in FPGA DSP engine compared to the
software running on a generic X86 processor.

The second part of your problem is getting data to and from the DSP
Processor or FPGA. We see a lot of these applications because a lot of
our boards are PCI or PCI-E based and offer a high bandwidth route to
the PC but you may want to consider USB2 as an alternative. It is not
as performant as PCI usually. Often the USB controller itself
effectively sits on a PCI bus so is limited by that if not it's own
protocol handing. If you do have a high data bandwidth requirement do
be careful of what PCI and PCI-E can deliver in reality. 20-50% of the
bus/link speed is a reason first level guide of what can be achieved
and even then there are lots of factors.

John Adair
Enterpoint Ltd.
www.enterpoint.co.uk



On 20 Jun, 06:41, "h...@hit.edu.cn" <h...@hit.edu.cn> wrote:
> There is a software application which demonds huge computation. My aim
> is to port the software program into hardware. So first of all, I need
> to evaluate the required volumn of computation.
>
> The situation is that the software program will need up to 22 hours on
> personal computer (P4 3.06G, 2GB RAM) to finish the computation, while
> the operations mostly are float-point multiplications and additions. I
> need to finish the computation by hardware in serveral minites. Is it
> possible?
>
> Another problem is that which kind of hardware platforms is
> perferable. I have checked that the production provided by
> Nallatech(www.nallatech.com) seems suitable for me. I am still
> wondering whether I should choose some boards with multiple DSP
> processors on it? I think DSP processors are better at floating point
> operations.



Reply With Quote
  #6 (permalink)  
Old 06-20-2007, 10:00 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

[email protected] wrote:

> There is a software application which demonds huge computation. My aim
> is to port the software program into hardware. So first of all, I need
> to evaluate the required volumn of computation.


> The situation is that the software program will need up to 22 hours on
> personal computer (P4 3.06G, 2GB RAM) to finish the computation, while
> the operations mostly are float-point multiplications and additions. I
> need to finish the computation by hardware in serveral minites. Is it
> possible?


If you give some hints as to what type of computation you are
doing you will get better answers. Floating point, especially addition,
is relatively expensive in an FPGA. You might consider the possibility
of using fixed point, even if it is significantly wider. Sometimes
block floating point works well, where one exponent applies to many
(like a whole vector of) values. (The pre/post normalization needed
for floating point is expensive.)

The actual implementation of the algorithm is usually very different
in an FPGA. I recommend the systolic array architecture, but there
are other possibilities.

> Another problem is that which kind of hardware platforms is
> perferable. I have checked that the production provided by
> Nallatech(www.nallatech.com) seems suitable for me. I am still
> wondering whether I should choose some boards with multiple DSP
> processors on it? I think DSP processors are better at floating point
> operations.


There are some systems convenient for implementing such designs.
I have seen PCMCIA cards with an FPGA, though that might not be
big enough for you.

How many times do you need to run this problem, and how big
is your budget?

You might also look at: http://www.drccomputer.com/

-- glen


Reply With Quote
  #7 (permalink)  
Old 06-21-2007, 10:54 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

Hello John,

Thank you for you quick reply!

To be specific, my program is related to 3D image reconstruction. The
input data is float point numbers in the form of 3D array. I represent
it using symbol: g(x1,y1,z1), while the number of x1,y1,z1 are quite
large. The output data is another 3D float point array. I represent it
using symbol: F(x2,y2,z2), similarily the number of x2,y2,z2 are also
quite large.

The estimated memory usage is 2GB or so, while the calculations
required is listed below:

integer addition 2442 Giga operations per second
float add 814 Giga operations per second
float substrac 2424 Giga operations per second
float muliply 1610 Giga operations per second
float divide 809 Giga operations per second

I want these calculations done in one minite, so we can divide the
operations by 60.
As a matter of fact, if the calculations could be done in 5 minites,
it is OK for me. So for minimum requirment, divide the above numbers
by 300(=60*5).

And your suggestion?

Reply With Quote
  #8 (permalink)  
Old 06-21-2007, 10:59 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

Hello glen:

Please refer to my reply to John. There is no problem about the
budget. And I prefer to get the whole calculations done in 10 minites
as I have said in my reply to John.

Reply With Quote
  #9 (permalink)  
Old 06-21-2007, 11:58 PM
Marc Battyani
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing


<[email protected]> wrote
>
> To be specific, my program is related to 3D image reconstruction. The
> input data is float point numbers in the form of 3D array. I represent
> it using symbol: g(x1,y1,z1), while the number of x1,y1,z1 are quite
> large. The output data is another 3D float point array. I represent it
> using symbol: F(x2,y2,z2), similarily the number of x2,y2,z2 are also
> quite large.
>
> The estimated memory usage is 2GB or so, while the calculations
> required is listed below:
>
> integer addition 2442 Giga operations per second
> float add 814 Giga operations per second
> float substrac 2424 Giga operations per second
> float muliply 1610 Giga operations per second
> float divide 809 Giga operations per second
>
> I want these calculations done in one minite, so we can divide the
> operations by 60.
> As a matter of fact, if the calculations could be done in 5 minites,
> it is OK for me. So for minimum requirment, divide the above numbers
> by 300(=60*5).


Is it single or double precision floats?

Divided by 300 the numbers are not very high:
10.7G fadd/fsub
5.3G fmul
2.7G fdiv

IMO your limiting factor will probably be the memory bandwidth. If you have
enough memory bandwidth then it's not a problem for a large FPGA. You can
even go faster than that.

Marc


Reply With Quote
  #10 (permalink)  
Old 06-22-2007, 04:40 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

It is double precision. And can you please explain it for me why it
matters?
The another question is that as you mentioned memory bandwidth could
be an issue, can you evaluate the approximately required memory
bandwidth?

Reply With Quote
  #11 (permalink)  
Old 06-22-2007, 04:48 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

It is double precision. And can you please explain it for me why it
matters?
The another question is that as you mentioned memory bandwidth could
be an issue, can you evaluate the approximately required memory
bandwidth?

Reply With Quote
  #12 (permalink)  
Old 06-22-2007, 05:05 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

On 6 22 , 5 58 , "Marc Battyani" <Marc.Batty...@fractalconcept.com>
wrote:
> <h...@hit.edu.cn> wrote
>
>
>
>
>
>
>
> > To be specific, my program is related to 3D image reconstruction. The
> > input data is float point numbers in the form of 3D array. I represent
> > it using symbol: g(x1,y1,z1), while the number of x1,y1,z1 are quite
> > large. The output data is another 3D float point array. I represent it
> > using symbol: F(x2,y2,z2), similarily the number of x2,y2,z2 are also
> > quite large.

>
> > The estimated memory usage is 2GB or so, while the calculations
> > required is listed below:

>
> > integer addition 2442 Giga operations per second
> > float add 814 Giga operations per second
> > float substrac 2424 Giga operations per second
> > float muliply 1610 Giga operations per second
> > float divide 809 Giga operations per second

>
> > I want these calculations done in one minite, so we can divide the
> > operations by 60.
> > As a matter of fact, if the calculations could be done in 5 minites,
> > it is OK for me. So for minimum requirment, divide the above numbers
> > by 300(=60*5).

>
> Is it single or double precision floats?
>
> Divided by 300 the numbers are not very high:
> 10.7G fadd/fsub
> 5.3G fmul
> 2.7G fdiv
>
> IMO your limiting factor will probably be the memory bandwidth. If you have
> enough memory bandwidth then it's not a problem for a large FPGA. You can
> even go faster than that.
>
> Marc- -
>
> - -


It is double precision. And can you please explain it for me why it
matters?
The another question is that as you mentioned memory bandwidth could
be an issue, can you evaluate the approximately required memory
bandwidth?

Reply With Quote
  #13 (permalink)  
Old 06-22-2007, 10:14 AM
Colin Paul Gloster
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

In news:[email protected] ups.com
timestamped Thu, 21 Jun 2007 20:05:20 -0700, "[email protected]"
<[email protected]> posted:
"On 6 22 , 5 58 , "Marc Battyani" <Marc.Batty...@fractalconcept.com>
wrote:
> <h...@hit.edu.cn> wrote

[..]
>
> > The estimated memory usage is 2GB or so, while the calculations
> > required is listed below:

>
> > integer addition 2442 Giga operations per second

[ ^^^^^^^^^^
N.B. "per second".]
> > float add 814 Giga operations per second
> > float substrac 2424 Giga operations per second
> > float muliply 1610 Giga operations per second
> > float divide 809 Giga operations per second

>
> > I want these calculations done in one minite, so we can divide the
> > operations by 60.

[ ^^^^^^^^^^
N.B. one minute.]
[ ^^^^^ ^^^^^^
N.B. the person of Harbin Institute of Technology of China typed
"divide" but clearly conveyed and understood the concept of needing to
multiply in order to speed up an implementation to perform five
minutes' worth of calculations of the original implementation's in one
second and simply mixed up the English word "divide" with
"multiply". I am reminded of a language lesson in which I mixed up a
word for "close" with a word for "open" (or maybe it was the other way
round) so the teacher mocked me by trying to indicate that such
miscommunication could result in disaster if I was giving an order to
control a dam instead of a door or window.]
> > As a matter of fact, if the calculations could be done in 5 minites,
> > it is OK for me. So for minimum requirment, divide the above numbers
> > by 300(=60*5).

>
> Is it single or double precision floats?
>
> Divided by 300 the numbers are not very high:
> 10.7G fadd/fsub
> 5.3G fmul
> 2.7G fdiv"


Hello,

In case you missed them, please see my notes above in square brackets
(and to see what the ^ characters are being used to point at, view in
a monospace font such as Courier).

The numbers should be divided by the reciprocal of 300 (or, for
efficient calculations, multiplied by 300) so instead of 10.7Gflops;
5.3Gflops; and 2.7Gflops the results should be approximately...
244200 Gigafloating point operations per second for additions;
727200 Gigafloating point operations per second for subtractions;
483000 Gigafloating point operations per second for multiplications;
and
242700 Gigafloating point operations per second for divisions.


"[..]

It is double precision. And can you please explain it for me why it
matters?"

Slower and more memory (but less inaccurate).

"The another question is that as you mentioned memory bandwidth could
be an issue, can you evaluate the approximately required memory
bandwidth?"

The more bigger values which will be flowing through your gates, the
more bits per second you will want to be able to handle. I am
surprised that you did not notice the inappropriate calculation of
1610 Gflops / 300 = 5.3Gflops so if you plan on spending a lot of
money on your new implementation, please for your own sake,
double-check everything.

Regards,
Colin Paul Gloster
Reply With Quote
  #14 (permalink)  
Old 06-23-2007, 01:04 AM
Totally_Lost
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

On Jun 21, 2:54 am, "h...@hit.edu.cn" <h...@hit.edu.cn> wrote:
I tried to email offline to discuss some of these issues more, but
your email bounces

John

This is an automatically generated Delivery Status Notification.

Delivery to the following recipients failed.

[email protected]

Reply With Quote
  #15 (permalink)  
Old 06-23-2007, 02:26 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

On Jun 23, 7:04 am, Totally_Lost <air_b...@yahoo.com> wrote:
> On Jun 21, 2:54 am, "h...@hit.edu.cn" <h...@hit.edu.cn> wrote:
> I tried to email offline to discuss some of these issues more, but
> your email bounces
>
> John
>
> This is an automatically generated Delivery Status Notification.
>
> Delivery to the following recipients failed.
>
> h...@hit.edu.cn


I am sorry to hear that. I think perhaps my school mail system is
under maintenance.
Could you please send the mail to [email protected]
This is my private mailbox.
Thanks a lot.

Reply With Quote
  #16 (permalink)  
Old 06-25-2007, 04:41 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

On Jun 22, 4:14 pm, Colin Paul Gloster <Colin_Paul_Glos...@ACM.org>
wrote:
> Innews:[email protected] roups.com
> timestamped Thu, 21 Jun 2007 20:05:20 -0700, "h...@hit.edu.cn"
> <h...@hit.edu.cn> posted:
> "On 6 22 , 5 58 , "Marc Battyani" <Marc.Batty...@fractalconcept.com>
> wrote:
> > <h...@hit.edu.cn> wrote

> [..]
> >
> > > The estimated memory usage is 2GB or so, while the calculations
> > > required is listed below:

> >
> > > integer addition 2442 Giga operations per second

> [ ^^^^^^^^^^
> N.B. "per second".]
> > > float add 814 Giga operations per second
> > > float substrac 2424 Giga operations per second
> > > float muliply 1610 Giga operations per second
> > > float divide 809 Giga operations per second

> >
> > > I want these calculations done in one minite, so we can divide the
> > > operations by 60.

> [ ^^^^^^^^^^
> N.B. one minute.]
> [ ^^^^^ ^^^^^^
> N.B. the person of Harbin Institute of Technology of China typed
> "divide" but clearly conveyed and understood the concept of needing to
> multiply in order to speed up an implementation to perform five
> minutes' worth of calculations of the original implementation's in one
> second and simply mixed up the English word "divide" with
> "multiply". I am reminded of a language lesson in which I mixed up a
> word for "close" with a word for "open" (or maybe it was the other way
> round) so the teacher mocked me by trying to indicate that such
> miscommunication could result in disaster if I was giving an order to
> control a dam instead of a door or window.]
> > > As a matter of fact, if the calculations could be done in 5 minites,
> > > it is OK for me. So for minimum requirment, divide the above numbers
> > > by 300(=60*5).

> >
> > Is it single or double precision floats?
> >
> > Divided by 300 the numbers are not very high:
> > 10.7G fadd/fsub
> > 5.3G fmul
> > 2.7G fdiv"

>
> Hello,
>
> In case you missed them, please see my notes above in square brackets
> (and to see what the ^ characters are being used to point at, view in
> a monospace font such as Courier).
>
> The numbers should be divided by the reciprocal of 300 (or, for
> efficient calculations, multiplied by 300) so instead of 10.7Gflops;
> 5.3Gflops; and 2.7Gflops the results should be approximately...
> 244200 Gigafloating point operations per second for additions;
> 727200 Gigafloating point operations per second for subtractions;
> 483000 Gigafloating point operations per second for multiplications;
> and
> 242700 Gigafloating point operations per second for divisions.
>
> "[..]
>
> It is double precision. And can you please explain it for me why it
> matters?"
>
> Slower and more memory (but less inaccurate).
>
> "The another question is that as you mentioned memory bandwidth could
> be an issue, can you evaluate the approximately required memory
> bandwidth?"
>
> The more bigger values which will be flowing through your gates, the
> more bits per second you will want to be able to handle. I am
> surprised that you did not notice the inappropriate calculation of
> 1610 Gflops / 300 = 5.3Gflops so if you plan on spending a lot of
> money on your new implementation, please for your own sake,
> double-check everything.
>
> Regards,
> Colin Paul Gloster


I got what you mean, but the fact is what I wrote may make you
confused.

The total computation is described below:
integer add 2442 Giga operations per second
float add 814 Giga operations per second
float substrac 2424 Giga operations per second
float muliply 1610 Giga operations per second
float divide 809 Giga operations per second

If I need to get the operations done in one second, I need to finish
these number of calculations.
While if I need to get the operations done in one minute, I need to
finish ( these number of calculations / 60 )

So it is to divide the time needed, not to multiply.

Reply With Quote
  #17 (permalink)  
Old 07-09-2007, 11:40 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

[email protected] wrote:

> To be specific, my program is related to 3D image reconstruction. The
> input data is float point numbers in the form of 3D array. I represent
> it using symbol: g(x1,y1,z1), while the number of x1,y1,z1 are quite
> large. The output data is another 3D float point array. I represent it
> using symbol: F(x2,y2,z2), similarily the number of x2,y2,z2 are also
> quite large.


> The estimated memory usage is 2GB or so, while the calculations
> required is listed below:


> integer addition 2442 Giga operations per second
> float add 814 Giga operations per second
> float substrac 2424 Giga operations per second
> float muliply 1610 Giga operations per second
> float divide 809 Giga operations per second


> I want these calculations done in one minite, so we can divide the
> operations by 60.
> As a matter of fact, if the calculations could be done in 5 minites,
> it is OK for me. So for minimum requirment, divide the above numbers
> by 300(=60*5).


As well as I know image reconstruction, you should be able
to do it in fixed point. It is commonly done in floating point
for software implementations, but it still might be better to
use fixed point for a hardware implementation. Also,
for multiply and divide it makes a big difference if it is
multiplied or divided by a constant or variable.

You might prescale the floating point data in software before
sending it through the FPGA based hardware. I would probably
consider 24 bits intermediate data for a 16 bit result.

Look for books or papers on systolic array implementations
of matrix operations.

About how much can you spend on hardware? hundreds, thousands,
or millions of dollars?

-- glen

Reply With Quote
  #18 (permalink)  
Old 07-10-2007, 12:14 AM
PFC
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

On Mon, 09 Jul 2007 23:40:06 +0200, glen herrmannsfeldt
<[email protected]> wrote:

> [email protected] wrote:
>
>> To be specific, my program is related to 3D image reconstruction. The
>> input data is float point numbers in the form of 3D array. I represent
>> it using symbol: g(x1,y1,z1), while the number of x1,y1,z1 are quite
>> large. The output data is another 3D float point array. I represent it
>> using symbol: F(x2,y2,z2), similarily the number of x2,y2,z2 are also
>> quite large.

>
>> The estimated memory usage is 2GB or so, while the calculations
>> required is listed below:

>
>> integer addition 2442 Giga operations per second
>> float add 814 Giga operations per second
>> float substrac 2424 Giga operations per second
>> float muliply 1610 Giga operations per second
>> float divide 809 Giga operations per second

>


You could hack this onto a big GeForce GPU in a standard PC... check out
GPGPU.
Reply With Quote
  #19 (permalink)  
Old 07-15-2007, 10:05 AM
[email protected]
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

On Jul 10, 5:40 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> hi...@hit.edu.cn wrote:
> > To be specific, my program is related to 3D image reconstruction. The
> > input data is float point numbers in the form of 3D array. I represent
> > it using symbol: g(x1,y1,z1), while the number of x1,y1,z1 are quite
> > large. The output data is another 3D float point array. I represent it
> > using symbol: F(x2,y2,z2), similarily the number of x2,y2,z2 are also
> > quite large.
> > The estimated memory usage is 2GB or so, while the calculations
> > required is listed below:
> > integer addition 2442 Giga operations per second
> > float add 814 Giga operations per second
> > float substrac 2424 Giga operations per second
> > float muliply 1610 Giga operations per second
> > float divide 809 Giga operations per second
> > I want these calculations done in one minite, so we can divide the
> > operations by 60.
> > As a matter of fact, if the calculations could be done in 5 minites,
> > it is OK for me. So for minimum requirment, divide the above numbers
> > by 300(=60*5).

>
> As well as I know image reconstruction, you should be able
> to do it in fixed point. It is commonly done in floating point
> for software implementations, but it still might be better to
> use fixed point for a hardware implementation. Also,
> for multiply and divide it makes a big difference if it is
> multiplied or divided by a constant or variable.
>
> You might prescale the floating point data in software before
> sending it through the FPGA based hardware. I would probably
> consider 24 bits intermediate data for a 16 bit result.
>
> Look for books or papers on systolic array implementations
> of matrix operations.
>
> About how much can you spend on hardware? hundreds, thousands,
> or millions of dollars?
>
> -- glen- Hide quoted text -
>
> - Show quoted text -


Can you recommend some books about this field?
The budget should be tens of thousands of dollars.

Reply With Quote
  #20 (permalink)  
Old 08-02-2007, 05:13 AM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: Interesting problems about high performance computing

[email protected] wrote:

(I wrote)

>>You might prescale the floating point data in software before
>>sending it through the FPGA based hardware. I would probably
>>consider 24 bits intermediate data for a 16 bit result.


>>Look for books or papers on systolic array implementations
>>of matrix operations.


>>About how much can you spend on hardware? hundreds, thousands,
>>or millions of dollars?


> Can you recommend some books about this field?
> The budget should be tens of thousands of dollars.


http://www.google.com/search?hl=en&q...reconstruction

Should get you started.

Tens of thousands of $ might mean that you should use an existing
board with FPGA(s) on it. One large one might be about right.

-- glen

Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
International Journal of High Performance Systems Architecture (IJHPSA) [email protected] FPGA 0 07-14-2006 07:45 PM
High Performance Register File mr_camel Verilog 3 04-29-2004 03:01 AM
Interesting article about FPGAs jakab tanko FPGA 18 10-07-2003 01:57 PM
High-performance workstation DK FPGA 4 10-03-2003 12:26 AM


All times are GMT +1. The time now is 04:15 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved