Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
Hi,
I have no such experiences and ask for the question answers:
is it possible that two clock system blocks with one clock running
half of other's of same clock source don't need asynchronous input/
output buffers in best circuit and logic design?
Especially, for example Intel and AMD CPU chip's, their cache I runs
half frequency of CPU clock and gets almost 1/2 data rate as documents
show.
What is their designs standout? I know Xilinx chip has divided clock
outputs in addition to the main clock output and never have such an
experiences to use the technique.
I need a guidance and direction instructions on the subject. A book or
a paper reference is preferred.
Re: Do two clock system blocks with one clock running half of other's need asynchronous input/output buffers?
>Hi,
>I have no such experiences and ask for the question answers:
>
>is it possible that two clock system blocks with one clock running
>half of other's of same clock source don't need asynchronous input/
>output buffers in best circuit and logic design?
>
>Especially, for example Intel and AMD CPU chip's, their cache I runs
>half frequency of CPU clock and gets almost 1/2 data rate as documents
>show.
>
>What is their designs standout? I know Xilinx chip has divided clock
>outputs in addition to the main clock output and never have such an
>experiences to use the technique.
>
>I need a guidance and direction instructions on the subject. A book or
>a paper reference is preferred.
>
>Thank you.
>
>Weng
>
Hi Weng,
I am not sure about your design platform but from FPGA perspective:
If the two clocks are locked(frequency and phase), then you can conside
them synchronised assuming we trust the generating source. In this cas
there is no need to make extra efforts to cross domains. The issue yo
should be aware of is that they might not always be in phase as assumed an
in this case any phase-sensitive logic may occasionally fail. For example
pulse generated in the fast domain failing to be seen by the edge of slo
clock, this commonly leads to power-up problems. If the clocks are not i
phase "by design, for some reason" then your compiler should tell you i
there is any setup or hold violations. If there is violation I wil
consider them asynchronous.
For asynchronous clocks inside FPGAs, I normally use dual clock fifos fo
main crossing areas. Alternatively, you can make your crossing plans base
on double register synchronisation and correct data transfer
If your clocks are external(between chips) - as I understand from you
description - then this is a different matter. Board delay differences ar
inevitable. All I can say is that they are asynchronous. So you bette
cross domains with care or lock them together(e.g. inside an FPGA but thi
requires costly loop design).
Remember a phase lock loop uses phase difference to lock two frequencie
but this doesn't usually mean they are locked with respect to absolut
phase unless extra design effort is added.
Re: Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
On Oct 4, 10:25*am, "kadhiem_ayob" <kadhiem_a...@yahoo.co.uk> wrote:
> >Hi,
> >I have no such experiences and ask for the question answers:
>
> >is it possible that two clock system blocks with one clock running
> >half of other's of same clock source don't need asynchronous input/
> >output buffers in best circuit and logic design?
>
> >Especially, for example Intel and AMD CPU chip's, their cache I runs
> >half frequency of CPU clock and gets almost 1/2 data rate as documents
> >show.
>
> >What is their designs standout? I know Xilinx chip has divided clock
> >outputs in addition to the main clock output and never have such an
> >experiences to use the technique.
>
> >I need a guidance and direction instructions on the subject. A book or
> >a paper reference is preferred.
>
> >Thank you.
>
> >Weng
>
> Hi Weng,
>
> I am not sure about your design platform but from FPGA perspective:
> If the two clocks are locked(frequency and phase), then you can consider
> them synchronised assuming we trust the generating source. In this case
> there is no need to make extra efforts to cross domains. The issue you
> should be aware of is that they might not always be in phase as assumed and
> in this case any phase-sensitive logic may occasionally fail. For examplea
> pulse generated in the fast domain failing to be seen by the edge of slow
> clock, this commonly leads to power-up problems. If the clocks are not in
> phase "by design, for some reason" then your compiler should tell you if
> there is any setup or hold violations. If there is violation I will
> consider them asynchronous.
>
> For asynchronous clocks inside FPGAs, I normally use dual clock fifos for
> main crossing areas. Alternatively, you can make your crossing plans based
> on double register synchronisation and correct data transfer
>
> If your clocks are external(between chips) - as I understand from your
> description - then this is a different matter. Board delay differences are
> inevitable. All I can say is that they are asynchronous. So you better
> cross domains with care or lock them together(e.g. inside an FPGA but this
> requires costly loop design).
>
> Remember a phase lock loop uses phase difference to lock two frequencies
> but this doesn't usually mean they are locked with respect to absolute
> phase unless extra design effort is added.
>
> Kadhiem *- Hide quoted text -
>
> - Show quoted text -
Hi Kadhiem,
thank you for your response.
I am learning Intel 82496/82491 cache II controller chip and cache II
SRAM chip running at 66 MHz. the book was published in 1994.
I want to learn how they design 3 chips with same cycles from same
clock source in the board, including Pentium processor.
In the book, it doesn't mention asynchronous input/output FIFO are
used.
Now cache II chip controller and cache II SRAM are included in new
multiprocessor.
I am wondering the question: how they design the multiprocessor chip:
From Intel documents, 4 processors run at 2GHz or so and all their
cache I controller and cache I SRAM run at half rate (1 data rate per
cycle for core and 1 data rate per 2 cycles for cache I.)
If I were the designer, there might be two choices:
1. cache I controller and cache I SRAM run on clock which is main
clock source divided by 2 with input/output asynchronous FIFO in the
interface;
2. cache I controller and cache I SRAM run on clock which is main
clock source NOT divided by 2 withOUT input/output asynchronous FIFO
in the interface and with enable signal to run them at half rate.
Option 1 is reliable, but has a performance penalty. It seems one
cannot get the data rate with input/output asynchronous FIFO in the
interface, based on my experiences.
Option 2 is reliable too, but it has more energy usage, because its
clock runs at double rate than option 1. But it guarantees one data
per 2 cycles for cache I.
It seems to me that they must use option 2 instead of option 1. I
would like experts' opinion. Even though it may be a lip work.
Re: Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
On Oct 4, 9:09*am, Weng Tianxiang <wtx...@gmail.com> wrote:
> Hi,
> I have no such experiences and ask for the question answers:
>
> is it possible that two clock system blocks with one clock running
> half of other's of same clock source don't need asynchronous input/
> output buffers in best circuit and logic design?
>
> Especially, for example Intel and AMD CPU chip's, their cache I runs
> half frequency of CPU clock and gets almost 1/2 data rate as documents
> show.
>
> What is their designs standout? I know Xilinx chip has divided clock
> outputs in addition to the main clock output and never have such an
> experiences to use the technique.
>
> I need a guidance and direction instructions on the subject. A book or
> a paper reference is preferred.
>
> Thank you.
>
> Weng
Weng, let me explain the basics:
You want to drive a system with two clocks, one of them has half the
frequency of the other.
The important question is now: what is the phase relationship between
the frequencies? Or, in simpler terms, assuming you use rising edge
triggering of the flipflops and registers: What is the timing delay
between rising edges of both clocks.
If you are sure that there is no delay (which I would never really
believe) then there is no problem.
If, however there is a short systematic delay, where the rising edge
of f2 is always a few ns later than the rising edge of f1, then any
data transfer from f1-based to f2-based might be unreliable, because
the f2 clock might pick up either the old data or the new data that
had just been changed by f1. That's would be a race condition, or a
hold-time violation. In the opposite direction, there is no problem,
provided you still have enough set-up time available, after you lost
some due to the phase difference.
This all assumes that the phase relationship is known and stable. If
it isn't, then you should treat the phase relationship as unknown and
use asynchronous FIFOs or some handshaking.
If your system is slow, you can deliberately offset the rising edges
by half a period of the faster clock, which would give you well-
defined timing relationship and clock margin (but you gave up half the
potential speed)
Peter Alfke, still there, lurking on weekends...
Re: Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
Hi Weng,
When I was struggling with metastability, Philip Freidin was good
enough to point out where I was going wrong. I think his explanation
was very clear and helpful - it's at http://tinyurl.com/473w92 if you
want to have a look...
Cheers,
Simon (just giving back, and feeling good about it
Re: Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
On Oct 5, 6:48*pm, Peter Alfke <al...@sbcglobal.net> wrote:
> On Oct 4, 9:09*am, Weng Tianxiang <wtx...@gmail.com> wrote:
>
>
>
>
>
> > Hi,
> > I have no such experiences and ask for the question answers:
>
> > is it possible that two clock system blocks with one clock running
> > half of other's of same clock source don't need asynchronous input/
> > output buffers in best circuit and logic design?
>
> > Especially, for example Intel and AMD CPU chip's, their cache I runs
> > half frequency of CPU clock and gets almost 1/2 data rate as documents
> > show.
>
> > What is their designs standout? I know Xilinx chip has divided clock
> > outputs in addition to the main clock output and never have such an
> > experiences to use the technique.
>
> > I need a guidance and direction instructions on the subject. A book or
> > a paper reference is preferred.
>
> > Thank you.
>
> > Weng
>
> Weng, let me explain the basics:
> You want to drive a system with two clocks, one of them has half the
> frequency of the other.
> The important question is now: what is the phase relationship between
> the frequencies? Or, in simpler terms, assuming you use rising edge
> triggering of the flipflops and registers: What is the timing delay
> between rising edges of both clocks.
> If you are sure that there is no delay (which I would never really
> believe) then there is no problem.
> If, however there is a short systematic delay, where the rising edge
> of f2 is always a few ns later than the rising edge of f1, then any
> data transfer from f1-based to f2-based might be unreliable, because
> the f2 clock might pick up either the old data or the new data that
> had just been changed by f1. That's would be a race condition, or a
> hold-time violation. In the opposite direction, there is no problem,
> provided you still have enough set-up time available, after you lost
> some due to the phase difference.
> This all assumes that the phase relationship is known and stable. If
> it isn't, then you should treat the phase relationship as unknown and
> use asynchronous FIFOs or some handshaking.
> If your system is slow, you can deliberately offset the rising edges
> by half a period of the faster clock, which would give you well-
> defined timing relationship and clock margin (but you gave up half the
> potential speed)
> Peter Alfke, still there, lurking on weekends...- Hide quoted text -
>
> - Show quoted text -
Hi Peter,
Glad to receive your advice again.
"This all assumes that the phase relationship is known and stable. If
it isn't, then you should treat the phase relationship as unknown and
use asynchronous FIFOs or some handshaking. "
Re: Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
On Oct 5, 8:54*pm, Simon <goo...@gornall.net> wrote:
> Hi Weng,
>
> When I was struggling with metastability, Philip Freidin was good
> enough to point out where I was going wrong. I think his explanation
> was very clear and helpful - it's athttp://tinyurl.com/473w92if you
> want to have a look...
>
> Cheers,
> * * *Simon (just giving back, and feeling good about it
Hi Simon,
I read the recommended comments and it is the standard asynchronous
input/output handshaking.
One does it and it must sacrify 4 clocks, 2 for output control
signals, and 2 for input back control signals. Peter has a very nice
paper describing the situations.
But I don't think Intel 2GHz chip use the handshaking method, since
every signals delay is at least 4 clocks, while their document is 2
clock for cache I part. So that I think Intel is using enable signal
to control cache I part to get the full core and cache I as a
synchronous system, not sacrifying any data delays.
I have experiences designing a sucessful system with 3 clock rate in a
Xilinx chip using global clock as only clock source and use asistant
enable signals to control slow parts of design.
For a high perfomance processor chip, 4 clock delays are unacceptable.
Do you agree? Peter too?
Re: Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
Just speculating, but I'd bet large processors use derived clocks,
with known phase relationships, to avoid the typical asynchronous
clock boundary crossing logic wherever possible. Running large cache
structures at twice the clock frequency needed is too power hungry.
They might have been able to get away with it in the past, when caches
were smaller and power consumption was less important, but not any
longer.
Re: Do two clock system blocks with one clock running half of other'sneed asynchronous input/output buffers?
On Oct 8, 10:04*am, Andy <jonesa...@comcast.net> wrote:
> Just speculating, but I'd bet large processors use derived clocks,
> with known phase relationships, to avoid the typical asynchronous
> clock boundary crossing logic wherever possible. Running large cache
> structures at twice the clock frequency needed is too power hungry.
> They might have been able to get away with it in the past, when caches
> were smaller and power consumption was less important, but not any
> longer.
>
> Andy
Hi Andy,
Glad to hear you again. We are always on opposite sides of any coin.
Peter misses the point and your response hits the point:
"I'd bet large processors use derived clocks,
with known phase relationships, to avoid the typical asynchronous
clock boundary crossing logic wherever possible. "
I disagree with you. How can they manage the huge range of
temperatures that causes clock circiut shifting.
Peter, what is your opinion? From Xilinx FPGA designer's point of
view, there are large uncertain range for derived clocks. I remember
it may be 300 ps at least for a range of temperature.
Re: Do two clock system blocks with one clock running half of other's need asynchronous input/output buffers?
On Wed, 8 Oct 2008 18:38:04 -0700 (PDT), Weng Tianxiang
<[email protected]> wrote:
>On Oct 8, 10:04*am, Andy <jonesa...@comcast.net> wrote:
>Peter misses the point and your response hits the point:
>"I'd bet large processors use derived clocks,
>with known phase relationships, to avoid the typical asynchronous
>clock boundary crossing logic wherever possible. "
>
>I disagree with you. How can they manage the huge range of
>temperatures that causes clock circiut shifting.
There is probably a lot less logic involved in continuously
auto-calibrating a clock DLL to eliminate temperature/voltage drift,
than there is in boundary crossing logic for a 256-bit or wider internal
bus.