On Aug 17, 10:31 pm, "Marty Ryba" <
[email protected]>
wrote:
> Hi gang,
>
> I had some interesting results over the last couple weeks that gets me
> to thinking there may even be more ways to attack this problem. Picture one
> signal coming in (one bit for now, but maybe more later). I want copies of
> this signal at various delays from "now". I have a related question
> regarding the (less used) if/generate syntax: can this not be in a process?
> I tried to have a for...generate around a process with one little localized
> if/generate to handle the "first buffer" case in the second approach.
> Modelsim (5.8d PE) didn't like it, so I coded the first buffer separately.
>
> The first delay line method creates one buffer, one input address, and an
> array of output addresses. Each output is then read from the respective
> output address, and all of them get incremented on each read/write cycle
> (Clock Enable high). I noticed it synthesized fairly large, so I tried a
> second way, reasoning that since the synthesis tool couldn't assume anything
> about the arrangement of the addresses that it had to register the data out
> the wazoo to keep all the parallel reads happy.
>
> The second way was a lot more code. I created multiple buffers, each with
> its own input and output address. I made them shorter, but still the sum of
> them was larger than the first buffer. The output of the first feeds the
> second, and so on. The only disadvantage is that there is now a tighter
> constraint on the relative spacing of the delays, but I can live with that.
>
So to make sure I understand, your delay module has N outputs, and
from the input to any output number k there's a delay of Dk samples.
The first approach wrote to one big RAM, then read out from addresses
offset from the input by D0, then D1, then D2... through DN. The
second approach built a bunch of cascaded delays, where the first
implemented (D2-D1) samples, the next implemented (D3-D2) samples
etc. This might be what you need in, for example, a FIR filter with
very sparse coefficients.
What you can achieve depends on what the data rate is with respect to
the clock. If the data can come fast (i.e. N clock enables back to
back, then you need to physically instantiate at least N output
registers, and the second approach is probably best. If there are
more than N clocks between clock enables, the first approach is
probably best, where you read out one 'tap' per clock cycle and then
store it into its own register. If this holds, you might even be able
to "serialize" the operation you do with the taps, and avoid the
intermediate registers altogether.
- Kenn