FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-15-2004, 08:38 PM
Brannon King
Guest
 
Posts: n/a
Default DMA w/ Xilinx PCIX core: speed results and question

Params:
Xilinx's PCIX core for PCI64/PCIX at 66MHz
2v4000-4 running the controller core with 40 Fifos (10 targets, 2 channels,
r/w) and a busmaster wrapper
Tyan 2721 MB w/Xeon 2.6GHz w/ 4GB RAM
Win2k Server sp4
No scatter/gather support in driver
Exact same software and hardware for both reads and writes
Bus commands 1110 and 1111

Results:
Max host write speed: 70MB/s
Max host read speed: 230MB/s
Development time: six months w/ two engineers for both driver and core
wrapper


The timer does not include the memory allocations. Any ideas why the write
speed is so much slower? Would it be the latency parameters in the core? An
OS issue?


Reply With Quote
  #2 (permalink)  
Old 01-15-2004, 09:58 PM
Eric Crabill
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question


Hi,

> Results:
> Max host write speed: 70MB/s
> Max host read speed: 230MB/s
>
> The timer does not include the memory allocations.
> Any ideas why the write speed is so much slower?
> Would it be the latency parameters in the core? An
> OS issue?


When you say "write speed" do you refer to your device
becoming bus master and doing memory writes to the
system RAM behind the host bridge? Likewise, by the
term "read speed" do you refer to your device becoming
bus master and doing memory reads of the system RAM
behind the host bridge?

I just want to make sure I didn't mis-interpret your
question before I try to answer it. Or did I get it
backwards?

Eric
Reply With Quote
  #3 (permalink)  
Old 01-15-2004, 10:01 PM
Brannon King
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question

To clarify one issue, host write refers to DMA busmaster read (the busmaster
is on my device and is actually reading the data in from the host.)

"Brannon King" <[email protected]> wrote in message
news:[email protected]
> Params:
> Xilinx's PCIX core for PCI64/PCIX at 66MHz
> 2v4000-4 running the controller core with 40 Fifos (10 targets, 2

channels,
> r/w) and a busmaster wrapper
> Tyan 2721 MB w/Xeon 2.6GHz w/ 4GB RAM
> Win2k Server sp4
> No scatter/gather support in driver
> Exact same software and hardware for both reads and writes
> Bus commands 1110 and 1111
>
> Results:
> Max host write speed: 70MB/s
> Max host read speed: 230MB/s
> Development time: six months w/ two engineers for both driver and core
> wrapper
>
>
> The timer does not include the memory allocations. Any ideas why the write
> speed is so much slower? Would it be the latency parameters in the core?

An
> OS issue?
>
>



Reply With Quote
  #4 (permalink)  
Old 01-15-2004, 10:11 PM
Mark Schellhorn
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question

Is the bus operating in PCI or PCIX mode? If it's in PCI mode then you are
seeing the disadvantage of not being able to post read requests. Your device is
getting told to retry while the chipset fetches the read data.

If it's in PCIX mode then you should make sure that your DMA engine is issuing
as many posted read requests as possible of as large a size as possible.

Mark


Brannon King wrote:
> To clarify one issue, host write refers to DMA busmaster read (the busmaster
> is on my device and is actually reading the data in from the host.)
>
> "Brannon King" <[email protected]> wrote in message
> news:[email protected]
>
>>Params:
>>Xilinx's PCIX core for PCI64/PCIX at 66MHz
>>2v4000-4 running the controller core with 40 Fifos (10 targets, 2

>
> channels,
>
>>r/w) and a busmaster wrapper
>>Tyan 2721 MB w/Xeon 2.6GHz w/ 4GB RAM
>>Win2k Server sp4
>>No scatter/gather support in driver
>>Exact same software and hardware for both reads and writes
>>Bus commands 1110 and 1111
>>
>>Results:
>>Max host write speed: 70MB/s
>>Max host read speed: 230MB/s
>>Development time: six months w/ two engineers for both driver and core
>>wrapper
>>
>>
>>The timer does not include the memory allocations. Any ideas why the write
>>speed is so much slower? Would it be the latency parameters in the core?

>
> An
>
>>OS issue?
>>
>>

>
>
>


Reply With Quote
  #5 (permalink)  
Old 01-15-2004, 10:56 PM
Eric Crabill
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question


Hello,

Brannon King wrote:

> "Host write" refers to busmaster read.
> Max host write speed: 70MB/s
> Max host read speed: 230MB/s


I think Mark described it well in his post. If
this is PCI mode, it isn't entirely surprising.
If this is in PCI-X mode, and you are using split
transactions (supporting multiple outstanding is
best) then you may need to do some hunting.

The best tool for this is a bus analyzer, if you
have one (or maybe can borrow one from a vendor
to "evaluate" it?) There could be all manner of
secondary issues that cause problems:

* bus traffic from other agents
* you are behind a bridge
* your byte counts are small

Sorry I don't have a more specific answer for you.
Eric
Reply With Quote
  #6 (permalink)  
Old 01-16-2004, 12:29 AM
Brannon King
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question

For those speed tests the device was in PCI mode. I was assuming it would be
the same speed as PCIX (at the same bus speed) because the timing diagrams
all looked compatible between the two. Please explain what you mean by "post
read requests". Is there some workaround for this to make the PCI mode
handle this better?


"Mark Schellhorn" <[email protected]> wrote in message
news:[email protected]
> Is the bus operating in PCI or PCIX mode? If it's in PCI mode then you are
> seeing the disadvantage of not being able to post read requests. Your

device is
> getting told to retry while the chipset fetches the read data.
>
> If it's in PCIX mode then you should make sure that your DMA engine is

issuing
> as many posted read requests as possible of as large a size as possible.
>
> Mark
>
>
> Brannon King wrote:
> > To clarify one issue, host write refers to DMA busmaster read (the

busmaster
> > is on my device and is actually reading the data in from the host.)
> >
> > "Brannon King" <[email protected]> wrote in message
> > news:[email protected]
> >
> >>Params:
> >>Xilinx's PCIX core for PCI64/PCIX at 66MHz
> >>2v4000-4 running the controller core with 40 Fifos (10 targets, 2

> >
> > channels,
> >
> >>r/w) and a busmaster wrapper
> >>Tyan 2721 MB w/Xeon 2.6GHz w/ 4GB RAM
> >>Win2k Server sp4
> >>No scatter/gather support in driver
> >>Exact same software and hardware for both reads and writes
> >>Bus commands 1110 and 1111
> >>
> >>Results:
> >>Max host write speed: 70MB/s
> >>Max host read speed: 230MB/s
> >>Development time: six months w/ two engineers for both driver and core
> >>wrapper
> >>
> >>
> >>The timer does not include the memory allocations. Any ideas why the

write
> >>speed is so much slower? Would it be the latency parameters in the core?

> >
> > An
> >
> >>OS issue?
> >>
> >>

> >
> >
> >

>



Reply With Quote
  #7 (permalink)  
Old 01-16-2004, 01:46 AM
Andy Peters
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question

"Brannon King" <[email protected]> wrote in message news:<[email protected]>...
> Params:
> Xilinx's PCIX core for PCI64/PCIX at 66MHz
> 2v4000-4 running the controller core with 40 Fifos (10 targets, 2 channels,
> r/w) and a busmaster wrapper
> Tyan 2721 MB w/Xeon 2.6GHz w/ 4GB RAM
> Win2k Server sp4
> No scatter/gather support in driver
> Exact same software and hardware for both reads and writes
> Bus commands 1110 and 1111
>
> Results:
> Max host write speed: 70MB/s
> Max host read speed: 230MB/s
> Development time: six months w/ two engineers for both driver and core
> wrapper
>
>
> The timer does not include the memory allocations. Any ideas why the write
> speed is so much slower? Would it be the latency parameters in the core? An
> OS issue?


Have you used a PCI bus analyzer to see the bus traffic?

Is the write data sourced from cache, or is it being fetched from main memory?

--a
Reply With Quote
  #8 (permalink)  
Old 01-16-2004, 03:58 PM
Mark Schellhorn
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question

Actually I shouldn't have called them "posted reads". Posting a transaction
means that the initiator never gets an explicit acknowledgement that the
transaction reached its destination (like posting a letter in the mail). PCI
writes are posted. A PCI read by definition is non-posted because the initiator
must receive an acknowledgement (the read data).

What I should have said was that the PCI-X protocol allows the initiator to
pipeline reads. If you have a copy, the PCI-X spec explains it pretty well.
Here's the short version:

In PCI-X, the target of a transaction can terminate the transaction with a split
response, which tells the initiator that the target will get back to him later
with a completion transaction (data if it's a read). The request is tagged with
a 5-bit number that will come back with the completion so that the initiator can
match completions to outstanding requests. The initiator is allowed to have up
to 32 split requests outstanding in the pipeline at any one time. Each read
request can be for up to 4kB of data. The throughput of a system that takes full
advantage of split transaction is highest when the amount of data being
transferred is large and the latency is small enough that 32 tags can keep the
pipeline full.

In PCI, the target of a read transaction must either respond with data
immediately, or repeatedly terminate the read attempts with retry while he goes
off and fetches the data. Once he's fetched it, he will be able to respond
immediately to the initiator on the initiator's next attempt. This is very
inefficient because there is only one transaction in the pipeline at a time. If
the latency is large (the initiator has to retry many times), the throughput is
much lower than when pipelined reads are used.

If PCI-X mode is available, use it. Or, there may be chipset settings that you
can use to improve PCI mode performance. The chipset may be able to do
pre-fetching of data in anticipation of you reading it. There may also be burst
length settings that allow you to increase the amount of data transferred in a
single transaction. You need to read the specs for the chipset you are using and
figure out what can be tweaked.

Mark

Brannon King wrote:
> For those speed tests the device was in PCI mode. I was assuming it would be
> the same speed as PCIX (at the same bus speed) because the timing diagrams
> all looked compatible between the two. Please explain what you mean by "post
> read requests". Is there some workaround for this to make the PCI mode
> handle this better?
>
>


Reply With Quote
  #9 (permalink)  
Old 01-16-2004, 04:49 PM
Brannon King
Guest
 
Posts: n/a
Default Re: DMA w/ Xilinx PCIX core: speed results and question

As it seems a valuable response, here is Eric's answer:

Hi,

In PCI mode, when you try to "read" the host, most hosts

will immediately issue retry. However, they have gleaned

some valuable information -- the starting address.

That is called a "delayed read request".

Then, the host goes off and prefetches data from that

starting address. How much it prefetches is up to the

person that designed the host device. Probably 64 bytes

or something small like that.

While it is prefetching, if your device retries the read,

you'll keep getting retry termination. Time is passing.

Eventually, when the host is finished prefetching however

much is is going to prefetch, and you return to retry

the transaction (for the millionth attempt) it will this

time NOT retry you but will give you some data (from one

DWORD up to however much it prefetched...)

That is called a "delayed read completion".

If that satisfied your device, the "transaction" is over.

If you actually wanted more data (the host has no idea

how much data you wanted, since there are no attributes

in PCI mode) your device will get disconnected. Then,

your device will start a new "transaction" with a new

starting address, and this horrible process repeats.

It is terribly inefficient (but supposedly better than

having the host insert thousands of wait states, which

keeps the bus locked up so everyone else is not getting

a turn...)

This is replaced by something called split transactions

in PCI-X mode, which is more efficient. It is a bit more

complicated to explain, though. If you want me to give

that a stab, write back and I'll give it a shot tomorrow.

Eric


"Eric Crabill" <[email protected]> wrote in message
news:[email protected]
>
> Hi,
>
> > Results:
> > Max host write speed: 70MB/s
> > Max host read speed: 230MB/s
> >
> > The timer does not include the memory allocations.
> > Any ideas why the write speed is so much slower?
> > Would it be the latency parameters in the core? An
> > OS issue?

>
> When you say "write speed" do you refer to your device
> becoming bus master and doing memory writes to the
> system RAM behind the host bridge? Likewise, by the
> term "read speed" do you refer to your device becoming
> bus master and doing memory reads of the system RAM
> behind the host bridge?
>
> I just want to make sure I didn't mis-interpret your
> question before I try to answer it. Or did I get it
> backwards?
>
> Eric



Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Ranking Modelsim Coverage results using Python for Speed? ! Paddy3118 Verilog 4 09-19-2008 12:03 AM
pcix core in XC2VP7 Matthias Müller FPGA 3 12-29-2003 04:28 PM


All times are GMT +1. The time now is 08:32 AM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved