FPGA Central

#1 (**permalink**) 08-18-2008, 03:31 AM

Hi gang,

I had some interesting results over the last couple weeks that gets me
to thinking there may even be more ways to attack this problem. Picture one
signal coming in (one bit for now, but maybe more later). I want copies of
this signal at various delays from "now". I have a related question
regarding the (less used) if/generate syntax: can this not be in a process?
I tried to have a for...generate around a process with one little localized
if/generate to handle the "first buffer" case in the second approach.
Modelsim (5.8d PE) didn't like it, so I coded the first buffer separately.

The first delay line method creates one buffer, one input address, and an
array of output addresses. Each output is then read from the respective
output address, and all of them get incremented on each read/write cycle
(Clock Enable high). I noticed it synthesized fairly large, so I tried a
second way, reasoning that since the synthesis tool couldn't assume anything
about the arrangement of the addresses that it had to register the data out
the wazoo to keep all the parallel reads happy.

The second way was a lot more code. I created multiple buffers, each with
its own input and output address. I made them shorter, but still the sum of
them was larger than the first buffer. The output of the first feeds the
second, and so on. The only disadvantage is that there is now a tighter
constraint on the relative spacing of the delays, but I can live with that.

Any ideas on even cuter ways to code this up?

Here's the bit of code regarding if/generate that caused problems; it also
illustrates my second way of doing it:
gen_delay: for N = 0 to NUM_DELAYS-1 generate
delay_process: process(CLK)
begin
if rising_edge(CLK) then
....
if CE = '1' then -- valid data coming in
G1: if N = 0 generate
DBuffer(N)(InAddress(N)) <= X_IN; -- first buffer takes input
signal
end generate G1;
GN: if N /= 0 generate
DBuffer(N)(InAddress(N)) <= DBuffer(N-1)(OutAddress(N-1)); --
others take output of previous
end generate GN;
InAddress(N) <= InAddress(N) + 1;
OutAddress(N) <= OutAddress(N) + 1;
X_OUT(N) <= DBuffer(N)(OutAddress(N));
end if
end if
end process delay_process;
end generate gen_delay;

#2 (**permalink**) 08-18-2008, 05:25 AM

On Aug 18, 7:31*am, "Marty Ryba" <[email protected]>
wrote:
> Hi gang,
>
> * * I had some interesting results over the last couple weeks that gets me
> to thinking there may even be more ways to attack this problem. Picture one
> signal coming in (one bit for now, but maybe more later). I want copies of
> this signal at various delays from "now". I have a related question
> regarding the (less used) if/generate syntax: can this not be in a process?
> I tried to have a for...generate around a process with one little localized
> if/generate to handle the "first buffer" case in the second approach.
> Modelsim (5.8d PE) didn't like it, so I coded the first buffer separately..
>
> The first delay line method creates one buffer, one input address, and an
> array of output addresses. Each output is then read from the respective
> output address, and all of them get incremented on each read/write cycle
> (Clock Enable high). I noticed it synthesized fairly large, so I tried a
> second way, reasoning that since the synthesis tool couldn't assume anything
> about the arrangement of the addresses that it had to register the data out
> the wazoo to keep all the parallel reads happy.
>
> The second way was a lot more code. I created multiple buffers, each with
> its own input and output address. I made them shorter, but still the sum of
> them was larger than the first buffer. The output of the first feeds the
> second, and so on. The only disadvantage is that there is now a tighter
> constraint on the relative spacing of the delays, but I can live with that.
>
> Any ideas on even cuter ways to code this up?
>
> Here's the bit of code regarding if/generate that caused problems; it also
> illustrates my second way of doing it:
> gen_delay: for N = 0 to NUM_DELAYS-1 generate
> * delay_process: process(CLK)
> * begin
> * * if rising_edge(CLK) then
> ...
> * * * if CE = '1' then -- valid data coming in
> * * * * G1: if N = 0 generate
> * * * * * * DBuffer(N)(InAddress(N)) <= X_IN; -- first buffer takes input
> signal
> * * * * * *end generate G1;
> * * * * GN: if N /= 0 generate
> * * * * * *DBuffer(N)(InAddress(N)) <= DBuffer(N-1)(OutAddress(N-1)); -- *
> others take output of previous
> * * * * * end generate GN;
> * * * * InAddress(N) <= InAddress(N) + 1;
> * * * * OutAddress(N) <= OutAddress(N) + 1;
> * * * * X_OUT(N) <= DBuffer(N)(OutAddress(N));
> * * end if
> * end if
> *end process delay_process;
> end generate gen_delay;

if-generate is a concurrent statement and you can't use it inside a
process.

Regards,
JK

#3 (**permalink**) 08-18-2008, 09:05 AM

On Aug 18, 12:25*pm, JK <[email protected]> wrote:
> On Aug 18, 7:31*am, "Marty Ryba" <[email protected]>
> wrote:
>
>
>
>
>
> > Hi gang,
>
> > * * I had some interesting results over the last couple weeks that gets me
> > to thinking there may even be more ways to attack this problem. Pictureone
> > signal coming in (one bit for now, but maybe more later). I want copiesof
> > this signal at various delays from "now". I have a related question
> > regarding the (less used) if/generate syntax: can this not be in a process?
> > I tried to have a for...generate around a process with one little localized
> > if/generate to handle the "first buffer" case in the second approach.
> > Modelsim (5.8d PE) didn't like it, so I coded the first buffer separately.
>
> > The first delay line method creates one buffer, one input address, and an
> > array of output addresses. Each output is then read from the respective
> > output address, and all of them get incremented on each read/write cycle
> > (Clock Enable high). I noticed it synthesized fairly large, so I tried a
> > second way, reasoning that since the synthesis tool couldn't assume anything
> > about the arrangement of the addresses that it had to register the dataout
> > the wazoo to keep all the parallel reads happy.
>
> > The second way was a lot more code. I created multiple buffers, each with
> > its own input and output address. I made them shorter, but still the sum of
> > them was larger than the first buffer. The output of the first feeds the
> > second, and so on. The only disadvantage is that there is now a tighter
> > constraint on the relative spacing of the delays, but I can live with that.
>
> > Any ideas on even cuter ways to code this up?
>
> > Here's the bit of code regarding if/generate that caused problems; it also
> > illustrates my second way of doing it:
> > gen_delay: for N = 0 to NUM_DELAYS-1 generate
> > * delay_process: process(CLK)
> > * begin
> > * * if rising_edge(CLK) then
> > ...
> > * * * if CE = '1' then -- valid data coming in
> > * * * * G1: if N = 0 generate
> > * * * * * * DBuffer(N)(InAddress(N)) <= X_IN; -- first buffer takes input
> > signal
> > * * * * * *end generate G1;
> > * * * * GN: if N /= 0 generate
> > * * * * * *DBuffer(N)(InAddress(N)) <= DBuffer(N-1)(OutAddress(N-1)); -- *
> > others take output of previous
> > * * * * * end generate GN;
> > * * * * InAddress(N) <= InAddress(N) + 1;
> > * * * * OutAddress(N) <= OutAddress(N) + 1;
> > * * * * X_OUT(N) <= DBuffer(N)(OutAddress(N));
> > * * end if
> > * end if
> > *end process delay_process;
> > end generate gen_delay;
>
> if-generate is a concurrent statement and you can't use it inside a
> process.
>
> Regards,
> JK- Hide quoted text -
>
> - Show quoted text -

Hi
I really did not understand the use of input and output address.
For me..

DBuffer(0) = X_IN;
gen_delay: for N = 1 to NUM_DELAYS-1 generate
delay_process: process(CLK)
begin
if rising_edge(CLK) then
if CE = '1' then -- valid data coming in
DBuffer(N) <= DBuffer(N-1);
end if
end if
end process delay_process;
end generate gen_delay;
X_OUT = DBuffer(NUM_DELAYS);

Also i think no need to use if/generate (As JK said it is not legal)..
Simply we can write..Expects synthesis tools to optimize it...

if CE = '1' then -- valid data coming in
if N = 0 then
DBuffer(N)(InAddress(N)) <= X_IN; -- first buffer takes
input
signal
end if;
if N /= 0 then
DBuffer(N)(InAddress(N)) <= DBuffer(N-1)(OutAddress(N-1));
--
others take output of previous
end if;

regards

#4 (**permalink**) 08-18-2008, 09:16 AM

You dont need to put generate inside a process, just use the normal if-
then-else statements. When a synthesizer sees that a certain
conditional statement is based on a constant (N in this case), it will
only generate the valid option, and the unreachable conditions
ignored.

if/for - generate is mostly used for the easy auto-generation of
repeatable bits of logic. It is processed as a simulation is started,
not during simulation. So generates can only be conditioned via
constants.

#5 (**permalink**) 08-18-2008, 10:01 AM

As for the example:

Why not use a dual port ram:

have the write side address incrementing by 1 every clock cycle. And
then set the read address to (wr_addr - delay) with delay being the
variable delay you want?

Otherwise, using the usual shift register/taps is going to eat up a
lot of logic, and with a large mux could possibly cause horrible
timing problems.

#6 (**permalink**) 08-19-2008, 12:53 AM

On Aug 17, 10:31 pm, "Marty Ryba" <[email protected]>
wrote:
> Hi gang,
>
> I had some interesting results over the last couple weeks that gets me
> to thinking there may even be more ways to attack this problem. Picture one
> signal coming in (one bit for now, but maybe more later). I want copies of
> this signal at various delays from "now". I have a related question
> regarding the (less used) if/generate syntax: can this not be in a process?
> I tried to have a for...generate around a process with one little localized
> if/generate to handle the "first buffer" case in the second approach.
> Modelsim (5.8d PE) didn't like it, so I coded the first buffer separately.
>
> The first delay line method creates one buffer, one input address, and an
> array of output addresses. Each output is then read from the respective
> output address, and all of them get incremented on each read/write cycle
> (Clock Enable high). I noticed it synthesized fairly large, so I tried a
> second way, reasoning that since the synthesis tool couldn't assume anything
> about the arrangement of the addresses that it had to register the data out
> the wazoo to keep all the parallel reads happy.
>
> The second way was a lot more code. I created multiple buffers, each with
> its own input and output address. I made them shorter, but still the sum of
> them was larger than the first buffer. The output of the first feeds the
> second, and so on. The only disadvantage is that there is now a tighter
> constraint on the relative spacing of the delays, but I can live with that.
>

So to make sure I understand, your delay module has N outputs, and
from the input to any output number k there's a delay of Dk samples.
The first approach wrote to one big RAM, then read out from addresses
offset from the input by D0, then D1, then D2... through DN. The
second approach built a bunch of cascaded delays, where the first
implemented (D2-D1) samples, the next implemented (D3-D2) samples
etc. This might be what you need in, for example, a FIR filter with
very sparse coefficients.

What you can achieve depends on what the data rate is with respect to
the clock. If the data can come fast (i.e. N clock enables back to
back, then you need to physically instantiate at least N output
registers, and the second approach is probably best. If there are
more than N clocks between clock enables, the first approach is
probably best, where you read out one 'tap' per clock cycle and then
store it into its own register. If this holds, you might even be able
to "serialize" the operation you do with the taps, and avoid the
intermediate registers altogether.

- Kenn

#7 (**permalink**) 08-19-2008, 04:13 AM

"Tricky" <[email protected]> wrote in message
news:[email protected]...
> As for the example:
> Why not use a dual port ram:
>
> have the write side address incrementing by 1 every clock cycle. And
> then set the read address to (wr_addr - delay) with delay being the
> variable delay you want?
>
> Otherwise, using the usual shift register/taps is going to eat up a
> lot of logic, and with a large mux could possibly cause horrible
> timing problems.

That's what I did (sorry I didn't post the whole thing). The problem is if I
have one input and *more* than one simultaneous output, then it doesn't
become DPRAM but xPRAM of which there is no quick synthesis. That's why I
ended up (so far) with N smaller DPRAMs. As another poster pointed out, if
the input rate is slower then I could use some tricks to serialize things.
In this case, new data can come in on every 66 MHz clock cycle.

Thanks for the tip regarding the optimization of if statements with
constants ("doh!" moment when I read that).

FPGA Central

World's 1st FPGA Portal