Beginner's question on undersampling and aliasing [Archive]

John Monro

06-06-2007, 04:03 AM

prad wrote:
> Hi all,
>
> I am a newbie to DSP. I have a huge number of data samples that I want
> to perform FFT on. But due to the enormous amount of the data samples, I'd
> like to use only 1M samples and perform FFT. I know that this is severe
> undersampling and that it does not satisfy the Nyquist rate. But all I
> want is to prevent aliasing so that I can get correct information about
> the low frequency content in the data. I cannot use an analog anti-alias
> filter as the data is inherently digitized. Is there any time-domain
> method to help with my problem? I read about a half-band digital filter
> that eliminates the frequencies higher than half the sampling frequency.
> Can this filter help my problem as I read a lot of articles that the high
> frequencies have to be eliminated before digitization? If so, can someone
> point me to some link that discusses the design of such a filter?
>
>
> Thanks,
> Pradeep.

Pradeep,

The half-band filter you mention attenuates components above a quarter
of the sampling frequency, but its only advantage over other similar
(FIR) filters is that it is twice as fast. Otherwise it is not worth
the trouble as it is more complicated to code.

The filter needs to attenuate frequency components that are above one
half your NEW sample rate. (See http://www.iowegian.com/ for an
evaluation copy of some excellent FIR filter design software.)

The NEW sample rate is the rate you establish, in effect, by selecting
every Nth output sample, N being calculated so that you end up with 1M
output samples.

You can speed up the whole process by only calculating the output
samples that you are going to use, but this is at a cost of a little
more complication in the code.

Regards,
John

prad

06-07-2007, 01:59 AM

Hi all,

I am a newbie to DSP. I have a huge number of data samples that I wan
to perform FFT on. But due to the enormous amount of the data samples, I'
like to use only 1M samples and perform FFT. I know that this is sever
undersampling and that it does not satisfy the Nyquist rate. But all
want is to prevent aliasing so that I can get correct information abou
the low frequency content in the data. I cannot use an analog anti-alia
filter as the data is inherently digitized. Is there any time-domai
method to help with my problem? I read about a half-band digital filte
that eliminates the frequencies higher than half the sampling frequency
Can this filter help my problem as I read a lot of articles that the hig
frequencies have to be eliminated before digitization? If so, can someon
point me to some link that discusses the design of such a filter?

Thanks,
Pradeep.

Fred Marshall

06-07-2007, 02:58 AM

"prad" <[email protected]> wrote in message
news:[email protected] ...
> Hi all,
>
> I am a newbie to DSP. I have a huge number of data samples that I want
> to perform FFT on. But due to the enormous amount of the data samples, I'd
> like to use only 1M samples and perform FFT. I know that this is severe
> undersampling and that it does not satisfy the Nyquist rate. But all I
> want is to prevent aliasing so that I can get correct information about
> the low frequency content in the data. I cannot use an analog anti-alias
> filter as the data is inherently digitized. Is there any time-domain
> method to help with my problem? I read about a half-band digital filter
> that eliminates the frequencies higher than half the sampling frequency.
> Can this filter help my problem as I read a lot of articles that the high
> frequencies have to be eliminated before digitization? If so, can someone
> point me to some link that discusses the design of such a filter?

It's not clear what you mean. The sample rate and the number of samples are
two different things. You could use any sequence of samples to FFT and just
keep the sample rate the same. Have you tried that?

Fred

Jerry Avins

06-07-2007, 05:09 AM

prad wrote:
> Hi all,
>
> I am a newbie to DSP. I have a huge number of data samples that I want
> to perform FFT on. But due to the enormous amount of the data samples, I'd
> like to use only 1M samples and perform FFT. I know that this is severe
> undersampling and that it does not satisfy the Nyquist rate. But all I
> want is to prevent aliasing so that I can get correct information about
> the low frequency content in the data. I cannot use an analog anti-alias
> filter as the data is inherently digitized. Is there any time-domain
> method to help with my problem? I read about a half-band digital filter
> that eliminates the frequencies higher than half the sampling frequency.
> Can this filter help my problem as I read a lot of articles that the high
> frequencies have to be eliminated before digitization? If so, can someone
> point me to some link that discusses the design of such a filter?

You don't say what samples you expect to remove to pare your data down
to a mere million. If they are contiguous, the paring will not create
aliases. If the bandwidth of the original signal exceeded half the
sample rate, the aliases are already there. If the original sampled data
are clean, you can extract every Nth by a process called decimation, and
filtering before decimating is required. Ask more after googling for
"decimation".

ry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Rune Allnor

06-07-2007, 12:02 PM

On 7 Jun, 02:59, "prad" <[email protected]> wrote:
> Hi all,
>
> I am a newbie to DSP. I have a huge number of data samples that I want
> to perform FFT on.

Why do you want to compute the DFT? What do you hope
to achieve?

> But due to the enormous amount of the data samples, I'd
> like to use only 1M samples and perform FFT. I know that this is severe
> undersampling and that it does not satisfy the Nyquist rate.

Then you are in deep trouble. If the data are aliased, you
face a question on the form

"Given x+y = 1, find x and y."

There exists one true answer, but there is no way you can
find it, let lone tell it from all the other combinations
of x and y which satisfy x+y=1.

> But all I
> want is to prevent aliasing so that I can get correct information about
> the low frequency content in the data. I cannot use an analog anti-alias
> filter as the data is inherently digitized.

What do you mean? If you sample an analog process, then you
have no choise but to use an analog anti-alias filter.
If you have data from some discrete-time process, aliasing
is not an issue.

> Is there any time-domain
> method to help with my problem?

As far as filtering the data is concered, only the analog
anti-alias filter. Provided, of course, you are actually
sampling an analog process.

> I read about a half-band digital filter
> that eliminates the frequencies higher than half the sampling frequency.
> Can this filter help my problem as I read a lot of articles that the high
> frequencies have to be eliminated before digitization?

Doesn't apply here. Analog anti-alias filtering is the way
to go.

> If so, can someone
> point me to some link that discusses the design of such a filter?

I am sure someone can. However, it won't help you.
Did I mention that analog anti-alias filters are useful?

Rune

prad

06-07-2007, 07:57 PM

>On 7 Jun, 02:59, "prad" <[email protected]> wrote:
>> Hi all,
>>
>> I am a newbie to DSP. I have a huge number of data samples that
want
>> to perform FFT on.
>
>Why do you want to compute the DFT? What do you hope
>to achieve?
>
>> But due to the enormous amount of the data samples, I'd
>> like to use only 1M samples and perform FFT. I know that this i
severe
>> undersampling and that it does not satisfy the Nyquist rate.
>
>Then you are in deep trouble. If the data are aliased, you
>face a question on the form
>
>"Given x+y = 1, find x and y."
>
>There exists one true answer, but there is no way you can
>find it, let lone tell it from all the other combinations
>of x and y which satisfy x+y=1.
>
>> But all I
>> want is to prevent aliasing so that I can get correct informatio
about
>> the low frequency content in the data. I cannot use an analo
anti-alias
>> filter as the data is inherently digitized.
>
>What do you mean? If you sample an analog process, then you
>have no choise but to use an analog anti-alias filter.
>If you have data from some discrete-time process, aliasing
>is not an issue.
>
>> Is there any time-domain
>> method to help with my problem?
>
>As far as filtering the data is concered, only the analog
>anti-alias filter. Provided, of course, you are actually
>sampling an analog process.
>
>> I read about a half-band digital filter
>> that eliminates the frequencies higher than half the samplin
frequency.
>> Can this filter help my problem as I read a lot of articles that th
high
>> frequencies have to be eliminated before digitization?
>
>Doesn't apply here. Analog anti-alias filtering is the way
>to go.
>
>> If so, can someone
>> point me to some link that discusses the design of such a filter?
>
>I am sure someone can. However, it won't help you.
>Did I mention that analog anti-alias filters are useful?
>
>Rune
>
>

Dear all,
Thanks for all your replies but I am still a bit confused. Let me try
rephrasing my question so that everyone gets a better picture of what
am
trying to do. I have a large number (order of 10^13) of discretized data
samples. I have to obtain the frequency spectrum of this data so that I
can predict minima in the discretized data. I am using the FFTW C library
to obtain the Frequency spectrum. But the maximum size of my data arra
is
limited to 2^32 in C. So I am picking a smaller subset of 1M samples from
the original set of 10^13 data points.

Jerry is right about the fact that what I am looking for is subsampling
or decimation without running into aliasing problems. But due to th
large
amount of data, I cannot perform decimation on the entire data set. Also,
I'd like to point out that my data is inherently discretized and so using
an analog anti-aliasing filter to remove all the high frequenc
components
that I am not interested in is NOT an option for me. So some scheme that
resembles what John mentioned is of great interest to me. But I am not
sure if that will work. An ideal scheme for me would be like the
following:

1. Collect N=1M equidistant samples out of the n=10^13 data points.
2. Filter out the high frequency components present in the samples
using some kind of LPF, i.e.
For each sample "s_i" in the 1M samples
i. Collect some constant number (say 10) samples before and
after the sample "s_i"
ii. Perform a Digital Low Pass Filtering operation on these
21 samples.
iii. Pick one of these 21 samples (say the median of the
21 samples) to replace "s_i"

Will the above procedure eliminate or atleast drastically reduce
aliasing (assuming of course that the original 10^13 data points were
sampled without aliasing)? Can I somehow tell which is the lowest
frequency that was significantly filtered out (so that I know which o
the
lower frequencies contain aliases)?

Some other basic questions regarding interpretation of the N-pt FFT
results :

1. The frequency bin "i" corresponds to frequency fi = (i*fs/N),
right?
So bin N/2 contains frequency fs/2 while bin 1 contains
frequency fs/N .
2. Assuming that the phase angles lie in [0, 2pi], if the phase angle

of a frequency is 0, does that mean that the first positive peak
of that frequency is at t=0?
(Is this the case for FFTW's FFT results?)

Thanks,
Pradeep.

Ron N.

06-07-2007, 08:19 PM

On Jun 6, 5:59 pm, "prad" <[email protected]> wrote:
> Hi all,
>
> I am a newbie to DSP. I have a huge number of data samples that I want
> to perform FFT on. But due to the enormous amount of the data samples, I'd
> like to use only 1M samples and perform FFT. I know that this is severe
> undersampling and that it does not satisfy the Nyquist rate. But all I
> want is to prevent aliasing so that I can get correct information about
> the low frequency content in the data. I cannot use an analog anti-alias
> filter as the data is inherently digitized. Is there any time-domain
> method to help with my problem? I read about a half-band digital filter
> that eliminates the frequencies higher than half the sampling frequency.
> Can this filter help my problem as I read a lot of articles that the high
> frequencies have to be eliminated before digitization? If so, can someone
> point me to some link that discusses the design of such a filter?

Sounds like you simply need to low pass filter your samples
so that there no spectral content above your required noise
or accuracy level left at or above 0.5M cycles per total set
data length. Then you can decimate or interpolate M samples
out of your original data set. Any sufficiently quality digital low
pass filter will do, depending on your needs and constraints
(performance limits, maintaining relative phase information,
etc.)

This assumes that any aliasing in your original enormous
sample set was below your required noise or accuracy floor.

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M

prad

06-07-2007, 08:39 PM

Hi

Are you implying that I first pick 1M samples, then LPF the samples an
use them? Will that work? Or are you saying that I have to LPF the entir
digitized data and then do decimation? The second method is not feasibl
for me as the data set is too large and will require an enormous amount o
time. Please advise.

Thanks
Pradeep.

>On Jun 6, 5:59 pm, "prad" <[email protected]> wrote:
>> Hi all,
>>
>> I am a newbie to DSP. I have a huge number of data samples that
want
>> to perform FFT on. But due to the enormous amount of the data samples
I'd
>> like to use only 1M samples and perform FFT. I know that this i
severe
>> undersampling and that it does not satisfy the Nyquist rate. But all I
>> want is to prevent aliasing so that I can get correct informatio
about
>> the low frequency content in the data. I cannot use an analo
anti-alias
>> filter as the data is inherently digitized. Is there any time-domain
>> method to help with my problem? I read about a half-band digita
filter
>> that eliminates the frequencies higher than half the samplin
frequency.
>> Can this filter help my problem as I read a lot of articles that th
high
>> frequencies have to be eliminated before digitization? If so, ca
someone
>> point me to some link that discusses the design of such a filter?
>
>Sounds like you simply need to low pass filter your samples
>so that there no spectral content above your required noise
>or accuracy level left at or above 0.5M cycles per total set
>data length. Then you can decimate or interpolate M samples
>out of your original data set. Any sufficiently quality digital low
>pass filter will do, depending on your needs and constraints
>(performance limits, maintaining relative phase information,
>etc.)
>
>This assumes that any aliasing in your original enormous
>sample set was below your required noise or accuracy floor.
>
>
>IMHO. YMMV.
>--
>rhn A.T nicholson d.0.t C-o-M
>
>

Randy Yates

06-07-2007, 08:55 PM

"prad" <[email protected]> writes:

> Hi
>
> Are you implying that I first pick 1M samples, then LPF the samples and
> use them? Will that work? Or are you saying that I have to LPF the entire
> digitized data and then do decimation? The second method is not feasible
> for me as the data set is too large and will require an enormous amount of
> time. Please advise.
>
> Thanks
> Pradeep.

Pradeep,

I ran some rough computations and I have a method that could perform
this decimation in around two or three days using a modern PC. So yes
it would take a lot of time, but it's certainly possible.

Email me directly if you're interested.
--
% Randy Yates % "I met someone who looks alot like you,
%% Fuquay-Varina, NC % she does the things you do,
%%% 919-577-9882 % but she is an IBM."
%%%% <[email protected]> % 'Yours Truly, 2095', *Time*, ELO
http://home.earthlink.net/~yatescr

prad

06-07-2007, 09:47 PM

>prad wrote:
>
>> Are you implying that I first pick 1M samples, then LPF the sample
and
>> use them? Will that work? Or are you saying that I have to LPF th
entire
>> digitized data and then do decimation? The second method is no
feasible
>> for me as the data set is too large and will require an enormous amoun
of
>> time. Please advise.
>
>The exact answer depends much on the ratio of sampling rate to the
>frequency you are interested in. Most likely it is possible to come
>up with a relatively fast filter to get what you want.
>
>Consider the moving average filter. You will find that DSP people
>don't like it very much because its properties are not very good,
>but it is easy to describe and fast to implement.
>
>You might, for example, do a 1000 point moving average where the
>first output value is the mean of the first 1000 input samples,
>the second is the mean of 2 through 1001, etc. This can be done
>very fast with a circular buffer of length 1000 to update the
>sum appropriately. It might be that a moving average filter
>followed by a decimation by 10, (output only every 10th point)
>would be a good initial filter, and reduce your problem by
>a factor of ten. Then another filter could finish off the job.
>
>I have not done the calculations to show that the above is
>actually a good way to proceed, but only offer it to show
>that there are fast ways to filter large amounts of data,
>and that in some cases they might be useful. Also, that
>cascaded filters might be faster than doing the whole
>thing in one step.
>
>-- glen
>
>

Hi Glen,

One of my major problems is that the data sets I have are huge, in fac
in terms of memory complexity, they are O(n!). 10^13 is one of the smalles
data sets I have. So I'd like to know if there are ways of removing th
high frequency components using only the 1M samples.

Thanks,
Pradeep.

Randy Yates

06-07-2007, 10:25 PM

"prad" <[email protected]> writes:
> [...]
> Hi Glen,
>
> One of my major problems is that the data sets I have are huge, in fact
> in terms of memory complexity, they are O(n!). 10^13 is one of the smallest
> data sets I have. So I'd like to know if there are ways of removing the
> high frequency components using only the 1M samples.

Prad,

If you're asking if you can extract 1M samples from 10^13 samples without
filtering and without aliasing, then the answer is NO.

Denying reality won't help you get your problem solved.
--
% Randy Yates % "Though you ride on the wheels of tomorrow,
%% Fuquay-Varina, NC % you still wander the fields of your
%%% 919-577-9882 % sorrow."
%%%% <[email protected]> % '21st Century Man', *Time*, ELO
http://home.earthlink.net/~yatescr

glen herrmannsfeldt

06-07-2007, 10:28 PM

prad wrote:

> Are you implying that I first pick 1M samples, then LPF the samples and
> use them? Will that work? Or are you saying that I have to LPF the entire
> digitized data and then do decimation? The second method is not feasible
> for me as the data set is too large and will require an enormous amount of
> time. Please advise.

The exact answer depends much on the ratio of sampling rate to the
frequency you are interested in. Most likely it is possible to come
up with a relatively fast filter to get what you want.

Consider the moving average filter. You will find that DSP people
don't like it very much because its properties are not very good,
but it is easy to describe and fast to implement.

You might, for example, do a 1000 point moving average where the
first output value is the mean of the first 1000 input samples,
the second is the mean of 2 through 1001, etc. This can be done
very fast with a circular buffer of length 1000 to update the
sum appropriately. It might be that a moving average filter
followed by a decimation by 10, (output only every 10th point)
would be a good initial filter, and reduce your problem by
a factor of ten. Then another filter could finish off the job.

I have not done the calculations to show that the above is
actually a good way to proceed, but only offer it to show
that there are fast ways to filter large amounts of data,
and that in some cases they might be useful. Also, that
cascaded filters might be faster than doing the whole
thing in one step.

-- glen

glen herrmannsfeldt

06-07-2007, 11:01 PM

prad wrote:

(snip)

> One of my major problems is that the data sets I have are huge, in fact
> in terms of memory complexity, they are O(n!). 10^13 is one of the smallest
> data sets I have. So I'd like to know if there are ways of removing the
> high frequency components using only the 1M samples.

What is the sampling rate, and what frequency range do you need?

How many processors do you have, and how is the data stored?

As popular disk drives are about 5e11 bytes, and your data might be
in 16 bit samples, it would take 40 such drives to hold the data.
If those were on 40 machines, each could process its part, with
just a little extra work at the boundaries.

How long can you wait for results?

How much money do you have to buy more machines?

-- glen

prad

06-07-2007, 11:39 PM

Randy :

Yes, I am wondering if I can extract 1M samples out of 10^13 dat
points without aliasing. I am not saying that I do not want to perfor
filtering. But I cannot afford to perform filtering on the entire datase
and I am wondering if I can perform the filtering on a much smaller subse
of the data points. Something like 20*1M of the data points rather than th
10^13 points.

Glen :
Actually the data samples are produced by a code. But the code ca
exactly produce the ith data sample that I need out of the entire 10^1
data points. So storage is not an issue. My limitation is that the C cod
for the FFT takes the data input as an array of samples. And the larges
array size is 2^32. Since the data is not periodic, I need equidistan
samples picked out from the entire 10^13 data points. You can assume tha
the sampling rate for the entire data set is 2Ghz. Then the maximu
frequency in the data is 1Ghz (assuming that LPF was already done). Bu
now to get down to 1M samples from the total of 10^13 data points (i.e
decimate) I have to LPF again. But I cannot afford to LPF the entire dat
set and would like to apply the LPF to the 1M samples or maybe a constan
number of neighboring datapoints of the selected 1M samples. Is thi
possible?

Please advise.

Thanks,
Pradeep.

Thanks,
Pradeep.

>"prad" <[email protected]> writes:
>> [...]
>> Hi Glen,
>>
>> One of my major problems is that the data sets I have are huge, i
fact
>> in terms of memory complexity, they are O(n!). 10^13 is one of th
smallest
>> data sets I have. So I'd like to know if there are ways of removin
the
>> high frequency components using only the 1M samples.
>
>Prad,
>
>If you're asking if you can extract 1M samples from 10^13 sample
without
>filtering and without aliasing, then the answer is NO.
>
>Denying reality won't help you get your problem solved.
>--
>% Randy Yates % "Though you ride on the wheels o
tomorrow,
>%% Fuquay-Varina, NC % you still wander the fields of your
>%%% 919-577-9882 % sorrow."
>%%%% <[email protected]> % '21st Century Man', *Time*, ELO
>http://home.earthlink.net/~yatescr
>

Ron N.

06-08-2007, 12:37 AM

On Jun 7, 3:39 pm, "prad" <[email protected]> wrote:
> Randy :
>
> Yes, I am wondering if I can extract 1M samples out of 10^13 data
> points without aliasing. I am not saying that I do not want to perform
> filtering. But I cannot afford to perform filtering on the entire dataset
> and I am wondering if I can perform the filtering on a much smaller subset
> of the data points. Something like 20*1M of the data points rather than the
> 10^13 points.
>
> Glen :
> Actually the data samples are produced by a code. But the code can
> exactly produce the ith data sample that I need out of the entire 10^13
> data points. So storage is not an issue. My limitation is that the C code
> for the FFT takes the data input as an array of samples. And the largest
> array size is 2^32. Since the data is not periodic, I need equidistant
> samples picked out from the entire 10^13 data points. You can assume that
> the sampling rate for the entire data set is 2Ghz. Then the maximum
> frequency in the data is 1Ghz (assuming that LPF was already done). But
> now to get down to 1M samples from the total of 10^13 data points (i.e.
> decimate) I have to LPF again. But I cannot afford to LPF the entire data
> set and would like to apply the LPF to the 1M samples or maybe a constant
> number of neighboring datapoints of the selected 1M samples. Is this
> possible?

Not unless you know something about the entire data set,
such as that it is already bandlimited or filtered below some
reasonable limit.

If you only look at 1 M samples + 20 neighboring points
per sample, how do you know whether or not some of your
samples are sitting on top of narrow 41 point wide spikes
(alias noise compared to the 10^7 points of surrounding
data between samples) ?

You could try a randomized statistical sampling of
data points with varying sample spacings, and perhaps
fail to falsify that your data set has no energy in
selected frequency bands above some limit to some
degree of probability. Would that do?

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M

Jerry Avins

06-08-2007, 01:34 AM

prad wrote:
> Hi
>
> Are you implying that I first pick 1M samples, then LPF the samples and
> use them? Will that work? Or are you saying that I have to LPF the entire
> digitized data and then do decimation? The second method is not feasible
> for me as the data set is too large and will require an enormous amount of
> time. Please advise.

You are playing with very big numbers, but I presume you don't need to
do one of these every day. Filtering can be part of the decimating
process. You need to calculate only those samples you will eventually
use. The more you decimate, the lower you will have to set the filter's
cutoff, and the more you will fill in the minima you're looking for.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Jerry Avins

06-08-2007, 01:40 AM

prad wrote:
> Randy :
>
> Yes, I am wondering if I can extract 1M samples out of 10^13 data
> points without aliasing. I am not saying that I do not want to perform
> filtering. But I cannot afford to perform filtering on the entire dataset
> and I am wondering if I can perform the filtering on a much smaller subset
> of the data points. Something like 20*1M of the data points rather than the
> 10^13 points.
>
> Glen :
> Actually the data samples are produced by a code. But the code can
> exactly produce the ith data sample that I need out of the entire 10^13
> data points. So storage is not an issue. My limitation is that the C code
> for the FFT takes the data input as an array of samples. And the largest
> array size is 2^32. Since the data is not periodic, I need equidistant
> samples picked out from the entire 10^13 data points. You can assume that
> the sampling rate for the entire data set is 2Ghz. Then the maximum
> frequency in the data is 1Ghz (assuming that LPF was already done). But
> now to get down to 1M samples from the total of 10^13 data points (i.e.
> decimate) I have to LPF again. But I cannot afford to LPF the entire data
> set and would like to apply the LPF to the 1M samples or maybe a constant
> number of neighboring datapoints of the selected 1M samples. Is this
> possible?

Why not change the producing code to generate a decimated set in the
first place?

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

prad

06-08-2007, 03:11 AM

Hi all,

Jerry:
I was just thinking about sth like that. I was thinking that I might b
able to modify my code to produce the sum of a subset of data points. Yes
if I could produce the decimated data points directly then I guess m
problem is solved.

Ron:
Randomized Statistical Sampling is another good idea. Initially I wa
thinking along another line involving random sampling. I was thinkin
about producing random data samples and then performing FFT on thes
samples. In fact, I did it. But since I am not that familiar with DSP an
FFT, could not really figure out how to interpret the FFT results. I
fact, most of the links I found on FFT with non-uniformly spaced sample
were interpolating to find the equally spaced samples and then performin
FFT. Is this the standard technique for FFT with non-uniformly space
samples? Thanks Ron for this new idea. I will investigate it further.

BTW, some other basic questions regarding interpretation of the N-pt FFT
results :

1. The frequency bin "i" corresponds to frequency fi = (i*fs/N),
right? So bin N/2 contains frequency fs/2 while bin 1 contains
frequency fs/N .

2. Assuming that the phase angles lie in [0, 2pi], if the phase angle
of a frequency is 0, does that mean that the first positive peak
of that frequency is at t=0?
(Is this the case for FFTW's FFT results?)

Thanks,
Pradeep.

>prad wrote:
>> Randy :
>>
>> Yes, I am wondering if I can extract 1M samples out of 10^1
data
>> points without aliasing. I am not saying that I do not want to perform
>> filtering. But I cannot afford to perform filtering on the entir
dataset
>> and I am wondering if I can perform the filtering on a much smalle
subset
>> of the data points. Something like 20*1M of the data points rather tha
the
>> 10^13 points.
>>
>> Glen :
>> Actually the data samples are produced by a code. But the cod
can
>> exactly produce the ith data sample that I need out of the entir
10^13
>> data points. So storage is not an issue. My limitation is that the
code
>> for the FFT takes the data input as an array of samples. And th
largest
>> array size is 2^32. Since the data is not periodic, I need equidistant
>> samples picked out from the entire 10^13 data points. You can assum
that
>> the sampling rate for the entire data set is 2Ghz. Then the maximum
>> frequency in the data is 1Ghz (assuming that LPF was already done)
But
>> now to get down to 1M samples from the total of 10^13 data point
(i.e.
>> decimate) I have to LPF again. But I cannot afford to LPF the entir
data
>> set and would like to apply the LPF to the 1M samples or maybe
constant
>> number of neighboring datapoints of the selected 1M samples. Is this
>> possible?
>
>Why not change the producing code to generate a decimated set in the
>first place?
>
>Jerry
>--
>Engineering is the art of making what you want from things you can get.
>¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
>

Randy Yates

06-08-2007, 03:19 AM

"prad" <[email protected]> writes:

> Randy :
>
> Yes, I am wondering if I can extract 1M samples out of 10^13 data
> points without aliasing. I am not saying that I do not want to perform
> filtering. But I cannot afford to perform filtering on the entire dataset
> and I am wondering if I can perform the filtering on a much smaller subset
> of the data points. Something like 20*1M of the data points rather than the
> 10^13 points.

If you required the entire dataset to get the low-frequency accuracy
you need, then you can't just filter/decimate a small subset and get
that same accuracy.
--
% Randy Yates % "She's sweet on Wagner-I think she'd die for Beethoven.
%% Fuquay-Varina, NC % She love the way Puccini lays down a tune, and
%%% 919-577-9882 % Verdi's always creepin' from her room."
%%%% <[email protected]> % "Rockaria", *A New World Record*, ELO
http://home.earthlink.net/~yatescr

Eric Jacobsen

06-08-2007, 08:02 AM

On Thu, 07 Jun 2007 17:39:05 -0500, "prad"
<[email protected]> wrote:

>Randy :
>
> Yes, I am wondering if I can extract 1M samples out of 10^13 data
>points without aliasing.

If you take 1M contiguous data points out of the set as a hopefully
representative sample (which could be could if you think/know that the
statistics are stationary or close to it), then they won't be any more
aliased than the entire set.

If you want to just take 1M points by taking every Nth sample, that's
problematic unless you *know* that the signal is at least N times
oversampled. It does sound, however like that's not the case.

Taking a chunk of 1M contiguous samples could tell you useful things
about the signal, especially if you can show that the metrics that
you're looking for are reasonably stationary/stable across the rest of
the sample. Or you could space a few of these 1M samples sets across
the bigger set to get an idea of how stationary things are or not.

>I am not saying that I do not want to perform
>filtering. But I cannot afford to perform filtering on the entire dataset
>and I am wondering if I can perform the filtering on a much smaller subset
>of the data points. Something like 20*1M of the data points rather than the
>10^13 points.

As long as they're taken in a contiguous fashion it's going to have
the same effect as if you did the filtering on the entire set. There
may be some edge effects depending on how you go about this, but they
can be made manageable.

>Glen :
> Actually the data samples are produced by a code. But the code can
>exactly produce the ith data sample that I need out of the entire 10^13
>data points. So storage is not an issue. My limitation is that the C code
>for the FFT takes the data input as an array of samples. And the largest
>array size is 2^32. Since the data is not periodic, I need equidistant
>samples picked out from the entire 10^13 data points.

Bleah. Sounds like you want every Nth sample. Not good.

> You can assume that
>the sampling rate for the entire data set is 2Ghz. Then the maximum
>frequency in the data is 1Ghz (assuming that LPF was already done).

No need to assume, if it's sampled at 2GHz, the highest frequency in
the data is 1GHz. Whether there was any aliasing in the sampling
process may be a question, but once it's sampled the highest supported
frequency is 1GHz.

> But
>now to get down to 1M samples from the total of 10^13 data points (i.e.
>decimate) I have to LPF again. But I cannot afford to LPF the entire data
>set and would like to apply the LPF to the 1M samples or maybe a constant
>number of neighboring datapoints of the selected 1M samples. Is this
>possible?

There are definitely ways to decimate sets without wasting operations.
If you want every Nth sample, filtered appropriately, there's no need
to compute anything related to the 1 - (N-1)th output samples in
between. This can cut computation down tremendously, but in your
case you still got a whole lotta numbers to crunch.

Might you have access to any hardware accelerators? Like a board
with an FPGA on it connected to the computer? At a 400MHz clock
rate, which is practical these days, you could run the entire 10^13
set through a hardware filter in an FPGA in 69 hours (about three
days) assuming one clock per sample. Keeping the buffers fed and
getting the data out in a timely fashion might be a challenge, but
this sort of thing might be worth looking into.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Rune Allnor

06-08-2007, 09:57 AM

On 7 Jun, 20:57, "prad" <[email protected]> wrote:
> >On 7 Jun, 02:59, "prad" <[email protected]> wrote:
> >> Hi all,
>
> >> I am a newbie to DSP. I have a huge number of data samples that I
> want
> >> to perform FFT on.
>
> >Why do you want to compute the DFT? What do you hope
> >to achieve?
>
> >> But due to the enormous amount of the data samples, I'd
> >> like to use only 1M samples and perform FFT. I know that this is
> severe
> >> undersampling and that it does not satisfy the Nyquist rate.
>
> >Then you are in deep trouble. If the data are aliased, you
> >face a question on the form
>
> >"Given x+y = 1, find x and y."
>
> >There exists one true answer, but there is no way you can
> >find it, let lone tell it from all the other combinations
> >of x and y which satisfy x+y=1.
>
> >> But all I
> >> want is to prevent aliasing so that I can get correct information
> about
> >> the low frequency content in the data. I cannot use an analog
> anti-alias
> >> filter as the data is inherently digitized.
>
> >What do you mean? If you sample an analog process, then you
> >have no choise but to use an analog anti-alias filter.
> >If you have data from some discrete-time process, aliasing
> >is not an issue.
>
> >> Is there any time-domain
> >> method to help with my problem?
>
> >As far as filtering the data is concered, only the analog
> >anti-alias filter. Provided, of course, you are actually
> >sampling an analog process.
>
> >> I read about a half-band digital filter
> >> that eliminates the frequencies higher than half the sampling
> frequency.
> >> Can this filter help my problem as I read a lot of articles that the
> high
> >> frequencies have to be eliminated before digitization?
>
> >Doesn't apply here. Analog anti-alias filtering is the way
> >to go.
>
> >> If so, can someone
> >> point me to some link that discusses the design of such a filter?
>
> >I am sure someone can. However, it won't help you.
> >Did I mention that analog anti-alias filters are useful?
>
> >Rune
>
> Dear all,
> Thanks for all your replies but I am still a bit confused. Let me try
> rephrasing my question so that everyone gets a better picture of what I
> am
> trying to do. I have a large number (order of 10^13) of discretized data
> samples.

OK, 10e13 samples, each of (at least) two bytes; we are
talking about some 20 TeraBytes of data here.

Just the magnitude of data shuffling means that you are
into specialist's work. Even the most trivial task becomes
huge; just forget everything you tought you knew.

Find a copy of Knuth's "The art of computer programming",
the volume on searching huge amounts of data.

> I have to obtain the frequency spectrum of this data so that I
> can predict minima in the discretized data.

You haven't said how you measured your data, but forget
about this. one 1M DFT won't help, you still only represent
only 1/10 millionth of the data samples. The slightest
incoherency in the process which generated the data will
throw your estimate off.

> I am using the FFTW C library
> to obtain the Frequency spectrum. But the maximum size of my data array
> is
> limited to 2^32 in C.

Yep. If you use a 32-bit operating system.

> So I am picking a smaller subset of 1M samples from
> the original set of 10^13 data points.
>
> Jerry is right about the fact that what I am looking for is subsampling
> or decimation without running into aliasing problems. But due to the
> large
> amount of data, I cannot perform decimation on the entire data set. Also,
> I'd like to point out that my data is inherently discretized and so using
> an analog anti-aliasing filter to remove all the high frequency
> components
> that I am not interested in is NOT an option for me.

If the data are already aliased (no anti-aliasing filter
was used while sampling the data) then you are the proud
owner of several TeraBytes worth of rubbish. Using an
anti-alias filter when sampling, is such a basic piece
of knowlegde that not doing so ought to be liable for
criminal prosecution.

> So some scheme that
> resembles what John mentioned is of great interest to me. But I am not
> sure if that will work. An ideal scheme for me would be like the
> following:
>
> 1. Collect N=1M equidistant samples out of the n=10^13 data points.
> 2. Filter out the high frequency components present in the samples
> using some kind of LPF, i.e.
> For each sample "s_i" in the 1M samples
> i. Collect some constant number (say 10) samples before and
> after the sample "s_i"
> ii. Perform a Digital Low Pass Filtering operation on these
> 21 samples.
> iii. Pick one of these 21 samples (say the median of the
> 21 samples) to replace "s_i"
>
> Will the above procedure eliminate or atleast drastically reduce
> aliasing (assuming of course that the original 10^13 data points were
> sampled without aliasing)? Can I somehow tell which is the lowest
> frequency that was significantly filtered out (so that I know which of
> the
> lower frequencies contain aliases)?

Decimation is a standard process in DSP. Not all implementations
rely on using the FFT. Try another version of decimation and
be prepared to spend a lot of time working out the trivial
logistics of shuffling data around.

Rune

glen herrmannsfeldt

06-08-2007, 10:24 AM

Eric Jacobsen wrote:
> On Thu, 07 Jun 2007 17:39:05 -0500, "prad"
> <[email protected]> wrote:

(snip)

>> Actually the data samples are produced by a code. But the code can
>>exactly produce the ith data sample that I need out of the entire 10^13
>>data points. So storage is not an issue. My limitation is that the C code
>>for the FFT takes the data input as an array of samples. And the largest
>>array size is 2^32. Since the data is not periodic, I need equidistant
>>samples picked out from the entire 10^13 data points.

> Bleah. Sounds like you want every Nth sample. Not good.

Maybe not so bad. If you really want a large FFT it is possible
to make that up out of smaller FFTs. That is the whole idea behind
the FFT algorithm. To do that, though, the data comes in an unusual
order. If you can produce data in that order it should be easy to do
very large FFTs. That is especially true if you only want the low
frequency end of the FFT.

(snip)

> Might you have access to any hardware accelerators? Like a board
> with an FPGA on it connected to the computer? At a 400MHz clock
> rate, which is practical these days, you could run the entire 10^13
> set through a hardware filter in an FPGA in 69 hours (about three
> days) assuming one clock per sample. Keeping the buffers fed and
> getting the data out in a timely fashion might be a challenge, but
> this sort of thing might be worth looking into.

How did you get 69 hours? 1e13/4e8 is 25000, for about 6.9 hours.
It would seem easy to make a low pass filter in a reasonable
sized FPGA that could run at that rate.

-- glen

jim

06-08-2007, 12:32 PM

glen herrmannsfeldt wrote:
>
> prad wrote:
>
> > Are you implying that I first pick 1M samples, then LPF the samples and
> > use them? Will that work? Or are you saying that I have to LPF the entire
> > digitized data and then do decimation? The second method is not feasible
> > for me as the data set is too large and will require an enormous amount of
> > time. Please advise.
>
> The exact answer depends much on the ratio of sampling rate to the
> frequency you are interested in. Most likely it is possible to come
> up with a relatively fast filter to get what you want.
>
> Consider the moving average filter. You will find that DSP people
> don't like it very much because its properties are not very good,
> but it is easy to describe and fast to implement.
>
> You might, for example, do a 1000 point moving average where the
> first output value is the mean of the first 1000 input samples,
> the second is the mean of 2 through 1001, etc.

What good is a 1000 point moving average filter going to be? If he is
downsampling from 10^13 samples to 10^6 samples to prevent aliasing he's
going to need a low pass filter that has millions and millions of
coefficients.

Don't know what the actual problem is, but let's suppose the 10^13 data
points are a recording of all the pop songs that have ever been played
on the radio. After filtering and decimation the OP would have about one
sample per song. If you are going to use a moving average filter than
each output sample would be something like the average of all the
samples of one song (which probably in most songs would be close to
zero).
It's hard to imagine why, if the data contains only important info at
such a low frequency, it is being sampled at such a high sample rate in
the first place.

-jim

>This can be done
> very fast with a circular buffer of length 1000 to update the
> sum appropriately. It might be that a moving average filter
> followed by a decimation by 10, (output only every 10th point)
> would be a good initial filter, and reduce your problem by
> a factor of ten. Then another filter could finish off the job.
>
> I have not done the calculations to show that the above is
> actually a good way to proceed, but only offer it to show
> that there are fast ways to filter large amounts of data,
> and that in some cases they might be useful. Also, that
> cascaded filters might be faster than doing the whole
> thing in one step.
>
> -- glen

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Eric Jacobsen

06-08-2007, 07:49 PM

On Fri, 08 Jun 2007 01:24:59 -0800, glen herrmannsfeldt
<[email protected]> wrote:

>Eric Jacobsen wrote:
>> On Thu, 07 Jun 2007 17:39:05 -0500, "prad"
>> <[email protected]> wrote:
>
>(snip)
>
>>> Actually the data samples are produced by a code. But the code can
>>>exactly produce the ith data sample that I need out of the entire 10^13
>>>data points. So storage is not an issue. My limitation is that the C code
>>>for the FFT takes the data input as an array of samples. And the largest
>>>array size is 2^32. Since the data is not periodic, I need equidistant
>>>samples picked out from the entire 10^13 data points.
>
>> Bleah. Sounds like you want every Nth sample. Not good.
>
>Maybe not so bad. If you really want a large FFT it is possible
>to make that up out of smaller FFTs. That is the whole idea behind
>the FFT algorithm. To do that, though, the data comes in an unusual
>order. If you can produce data in that order it should be easy to do
>very large FFTs. That is especially true if you only want the low
>frequency end of the FFT.

That's a very interesting thought. Some judicious pruning of the FFT
computations to get the fraction of the spectrum of interest.

>> Might you have access to any hardware accelerators? Like a board
>> with an FPGA on it connected to the computer? At a 400MHz clock
>> rate, which is practical these days, you could run the entire 10^13
>> set through a hardware filter in an FPGA in 69 hours (about three
>> days) assuming one clock per sample. Keeping the buffers fed and
>> getting the data out in a timely fashion might be a challenge, but
>> this sort of thing might be worth looking into.
>
>How did you get 69 hours? 1e13/4e8 is 25000, for about 6.9 hours.
>It would seem easy to make a low pass filter in a reasonable
>sized FPGA that could run at that rate.
>
>-- glen

Yeah, musta mis-typed an exponent or something. You're right, it's
an order of magnitude better than I thought it'd be.

So it must be a good idea!! ;)

The buffering could be an issue, but even if the filter has to stop
and wait occassionally to load/unload the buffers, it'd still be an
excellent way to accelerate computation.

Eric Jacobsen
Minister of Algorithms
Abineau Communications
http://www.ericjacobsen.org

Rune Allnor

06-08-2007, 09:36 PM

On 7 Jun, 22:47, "prad" <[email protected]> wrote:

> One of my major problems is that the data sets I have are huge, in fact
> in terms of memory complexity, they are O(n!). 10^13 is one of the smallest
> data sets I have.

What the heck are you doing??

Only ten years ago, the 3D/4C seismic data surveys
broke the 1 TB limit; if you want to get into 10s of
Ts you ahve to get into 4D/4C seismic surveys. Only
the really big boys do that sort of thing.

A couple of days ago the marine survey company I work
for issued a press release about their activities.
In one year, the company recorded some 18 TB of data
during survey operations. These are the combined surveys
from 12 ships, each of 2000 to 3000 tons (about 90 m long),
each running a 24/7 survey operation for 4 months per year.

And you get that sort of data apparently without even
trying...

Something is seriously weird here. I just don't
believe your experiment design is a good one; it is
fundamentally flawed. No well-designed experiment,
particyularly one designed by a DSP newbie, ends up
with that sort of data. Period.

Why don't you get into details about how these data
come about and what you try to do with them?

Rune

Vladimir Vassilevsky

06-08-2007, 09:52 PM

Rune Allnor wrote:

> On 7 Jun, 22:47, "prad" <[email protected]> wrote:
>
>
>> One of my major problems is that the data sets I have are huge, in fact
>>in terms of memory complexity, they are O(n!). 10^13 is one of the smallest
>>data sets I have.
>
>
> What the heck are you doing??

Don't you know? This is homework. A stupident is generating a huge data
by software and then trying to make use of that data. Numerical nonsense
instead of thinking of the better approaches, as usual.

VLV

prad

06-09-2007, 01:39 AM

I am sorry that you think that. This is for a research subject and no
homework. I am not sharing details about the data as it would give awa
the novel modeling I am trying to do. Thanks to all those who gave usefu
information and helped me.

VLV: Please think before you send a comment like that.

Pradeep.

>
>
>Rune Allnor wrote:
>
>> On 7 Jun, 22:47, "prad" <[email protected]> wrote:
>>
>>
>>> One of my major problems is that the data sets I have are huge, i
fact
>>>in terms of memory complexity, they are O(n!). 10^13 is one of th
smallest
>>>data sets I have.
>>
>>
>> What the heck are you doing??
>
>Don't you know? This is homework. A stupident is generating a huge data
>by software and then trying to make use of that data. Numerical nonsens

>instead of thinking of the better approaches, as usual.
>
>VLV
>

Rune Allnor

06-09-2007, 10:33 AM

On 9 Jun, 02:39, "prad" <[email protected]> wrote:
> I am sorry that you think that.

Don't top post when commenting on previous posts.

> This is for a research subject and not
> homework. I am not sharing details about the data as it would give away
> the novel modeling I am trying to do.

Those who have followed my posts here for a couple of years
would know that I sometimes make a point of saying that I
am an engineer, not a researcher, despite my misfortune
of having obtained a PhD degree. The difference is that
the engineer knows what he is doing whereas the researcher
does not.

Tour project is flawed. There is no need whatsoever to
come up with those sorts of data in any but the biggest
survey projects.

There is the vanishing (though still non-zero) chance that
you really are into something, but even so, your timing is
wrong. These days, the sheer logistics of what you try to do
is out of reach for anyone but the largest computer centra.
That will change with time. When I started playing with
computers 20 years ago, it was specialist's work to handle
more than 64 kilobytes of data at any one time. When I first
played with certan sonar data processing ideas some 10 years
ago, anything beyond 1MByte was, for practical purposes,
outside my reach. These days my computer's RAM is the
limitation (it has only 512 MBytes) and I plan my programs
for use with, say, 10 Gyte of data once I can get my hands
on that sort of computer. It will be affordable by the
time I finish my program.

So times change, and maybe you or I will look up this
thread in five or ten years time and smile at the old
days when a mere 20 TB of data represented an insurmountable
obstacle.

However, ath the time of writing, June 2007, 20 TB of data
*is* an insurmountable obstacle. If you end up with the
need to process that sort of data, there is sucha huge
discrepancy between what you want and what is whithin
your abilities to do, that you might as well do something
else in the mean time, waiting for the required technology
to become available to you.

If you do this as part of an employment, circulate your
resume. Whoever assigned you to this task has no idea
what he or she is doing.

> VLV: Please think before you send a comment like that.

Vladimir is, as is his habit, perfectly on the mark.

Rune

In Norwegian, "researcher" is translated to "forsker"
"one who does research", whereas "scientist" is translated
to "vitenskapsmann" which means "one who knows the
sciences." The difference might be subtle, but anyone
can emark on some sort of research, while insight into
the sciences requires more insight, usually otained
through decades of dedicated studies. Needless to say,
the world are full of researchers, with hardly one
scientist alive, world wide.

glen herrmannsfeldt

06-09-2007, 01:12 PM

jim wrote:
(snip)

> What good is a 1000 point moving average filter going to be? If he is
> downsampling from 10^13 samples to 10^6 samples to prevent aliasing he's
> going to need a low pass filter that has millions and millions of
> coefficients.

Maybe it should be more than 1000, maybe less. I haven't seen any
numbers related to the frequency range of interest.

> It's hard to imagine why, if the data contains only important info at
> such a low frequency, it is being sampled at such a high sample rate in
> the first place.

It seems that it is generated, and not from a natural source. It might
be the output of a linear congruential or linear feedback shift
register, for example. It might be a 43 bit LFSR, and one bit samples.
Note that the number of bits per sample still hasn't been mentioned.

I believe with either linear congruential or LFSR you can rewrite it
to generate every Nth point of the original series.

-- glen

glen herrmannsfeldt

06-09-2007, 01:14 PM

Rune Allnor wrote:

(snip)

> Why don't you get into details about how these data
> come about and what you try to do with them?

He seems to want to keep the data source proprietary,
in which case I believe our answers also should be.

Now, if he wants to pay for answers, that is different.

-- glen

Vladimir Vassilevsky

06-09-2007, 09:08 PM

Now I am sure that what you are doing is sheer nonsense. Besides the
dumb brute forcing, the obvious sign is the cluelessness, the other
obvious sign is the secrecy. The earlier you will dismiss your priceless
ideas, the better.

When people do a serious research, they don't ask the stupid questions
in the newsgroups. Instead, they learn the subject themselves and/or
seek for the professional assistance.

VLV

prad wrote:
> I am sorry that you think that. This is for a research subject and not
> homework. I am not sharing details about the data as it would give away
> the novel modeling I am trying to do. Thanks to all those who gave useful
> information and helped me.
>
> VLV: Please think before you send a comment like that.
>
>
>
> Pradeep.
>

>>
>>Don't you know? This is homework. A stupident is generating a huge data
>>by software and then trying to make use of that data. Numerical nonsense
>
>
>>instead of thinking of the better approaches, as usual.
>>
>>VLV
>>

Ron N.

06-10-2007, 01:26 AM

On Jun 7, 7:11 pm, "prad" <[email protected]> wrote:
> Ron:
> Randomized Statistical Sampling is another good idea. Initially I was
> thinking along another line involving random sampling. I was thinking
> about producing random data samples and then performing FFT on these
> samples. In fact, I did it. But since I am not that familiar with DSP and
> FFT, could not really figure out how to interpret the FFT results. In
> fact, most of the links I found on FFT with non-uniformly spaced samples
> were interpolating to find the equally spaced samples and then performing
> FFT. Is this the standard technique for FFT with non-uniformly spaced
> samples? Thanks Ron for this new idea. I will investigate it further.

Actually, this might be a place where trying to use a randomly
sampled low pass filter might be better than nothing.
Essentially create your low pass filter waveform (say a
windowed sinc of some period and width), and then use that
filter waveform in a weighted random number generator. Use
those weighted random numbers to select sub-samples centered
around the neighborhood of a sample point of interest. After
some number of sub-samples, if the mean and variance seem to be
converging somewhere after a sufficient number of sub-samples,
then the mean might approximate the value of a decimated sample
of the bandlimited signal perhaps within some statistical
confidence interval.

Does this type of procedure have a name?

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M

Fred Marshall

06-10-2007, 09:38 PM

prad,

If someone suggested it, then I've missed it.... here's what I would do:

Because you have so many data points the frequency resolution could be quite
a bit better than you need. The fewer contiguous points you use, the
coarser the frequency resolution.

So, I might take 1024 or 4096 or .... you choose the number ... and compute
an FFT on just those contiguous samples. You might do this for various such
epochs along the total sequence. While the resolution will be limited, the
entire frequency range will be covered each time.
If the results are quite different then you know that the spectral character
of the samples is varying from segment to segment.
If the results are rather similar then the opposite.

Also, you'll be able to see the actual important bandwidth of the
information - so you might be able to decide that some decimation is OK to
do without aliasing.

You should be asking yourself this question:
Even though I have a huge number of samples, what is the frequency
resolution that I require? The frequency resolution is the reciprocal of
the temporal epoch that you choose to analyze.

Example:

If you have 1 second worth of samples and the sample rate is 1MHz, then you
have 10^6 samples. If you FFT the whole sequence, you will have 1Hz
resolution over a range 0.5MHz (fs/2). Maybe 1Hz resolution is overkill for
your application.

So, 0.1secs of data would be 100,000 samples with 10Hz resolution... and so
forth.

Pick the temporal length that gives suitable resolution.

I hope this helps.

Fred

prad

06-11-2007, 12:28 AM

Fred and Ron,

Thanks for your input. I'll try your suggestions.

Prad.

>prad,
>
>If someone suggested it, then I've missed it.... here's what I would do:
>
>Because you have so many data points the frequency resolution could b
quite
>a bit better than you need. The fewer contiguous points you use, the
>coarser the frequency resolution.
>
>So, I might take 1024 or 4096 or .... you choose the number ... an
compute
>an FFT on just those contiguous samples. You might do this for variou
such
>epochs along the total sequence. While the resolution will be limited
the
>entire frequency range will be covered each time.
>If the results are quite different then you know that the spectra
character
>of the samples is varying from segment to segment.
>If the results are rather similar then the opposite.
>
>Also, you'll be able to see the actual important bandwidth of the
>information - so you might be able to decide that some decimation is O
to
>do without aliasing.
>
>You should be asking yourself this question:
>Even though I have a huge number of samples, what is the frequency
>resolution that I require? The frequency resolution is the reciprocal o

>the temporal epoch that you choose to analyze.
>
>Example:
>
>If you have 1 second worth of samples and the sample rate is 1MHz, the
you
>have 10^6 samples. If you FFT the whole sequence, you will have 1Hz
>resolution over a range 0.5MHz (fs/2). Maybe 1Hz resolution is overkil
for
>your application.
>
>So, 0.1secs of data would be 100,000 samples with 10Hz resolution... an
so
>forth.
>
>Pick the temporal length that gives suitable resolution.
>
>I hope this helps.
>
>Fred
>
>
>

John E. Hadstate

06-11-2007, 10:39 AM

"Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote in
message
news:[email protected] et...

>
> So, I might take 1024 or 4096 or .... you choose the
> number ... and compute an FFT on just those contiguous
> samples. You might do this for various such epochs along
> the total sequence. While the resolution will be limited,
> the entire frequency range will be covered each time.
> If the results are quite different then you know that the
> spectral character of the samples is varying from segment
> to segment.
> If the results are rather similar then the opposite.

Consider what you would see if you computed a short DFT on a
low-baud-rate, highly-oversampled FSK signal. If the DFT is
short enough relative to the baud rate, you will see the
Mark and Space frequencies in separate DFT windows. Would
you then conclude that the spectral character of the signal
is varying or would you conclude that the varying spectrum
characterizes the signal?

> You should be asking yourself this question:
> Even though I have a huge number of samples, what is the
> frequency resolution that I require? The frequency
> resolution is the reciprocal of the temporal epoch that
> you choose to analyze.
>

A huge number of samples might be the result of
oversampling. It might also be the result of long-term
observation of a phenomenon that is undersampled. It
sounds like the OP is not sure which case applies to his
data.

Fred Marshall

06-11-2007, 08:14 PM

"John E. Hadstate" <[email protected]> wrote in message
news:DL8bi.26179$dy1.22507@bigfe9...
>
> "Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote in message
> news:[email protected] et...
>
>>
>> So, I might take 1024 or 4096 or .... you choose the number ... and
>> compute an FFT on just those contiguous samples. You might do this for
>> various such epochs along the total sequence. While the resolution will
>> be limited, the entire frequency range will be covered each time.
>> If the results are quite different then you know that the spectral
>> character of the samples is varying from segment to segment.
>> If the results are rather similar then the opposite.
>
> Consider what you would see if you computed a short DFT on a
> low-baud-rate, highly-oversampled FSK signal. If the DFT is short enough
> relative to the baud rate, you will see the Mark and Space frequencies in
> separate DFT windows. Would you then conclude that the spectral character
> of the signal is varying or would you conclude that the varying spectrum
> characterizes the signal?

John,

Yes, of course I would. :-)

Fred

Ron N.

06-12-2007, 01:23 AM

On Jun 10, 1:38 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
wrote:
> So, I might take 1024 or 4096 or .... you choose the number ... and compute
> an FFT on just those contiguous samples. You might do this for various such
> epochs along the total sequence. While the resolution will be limited, the
> entire frequency range will be covered each time.

An fft of 1024 contiguous samples won't tell you too much
about the spectral content of frequencies on the rough
order of one-hundredth to one-billionth of the fft's first
bin, which seems to be where the OP is looking (the first
10e6 dft bins out of circa 10e13 possible?).

It still seems to me that the way to look at such huge
potential data sets (the weight of every rodent in North
America by longitude, or some such) is to start with
some statistical sampling. What I'm wondering is if
there is a name for the procedure of taking a bunch of
randomly spaced samples and doing a regression fit of
those samples against an set of orthogonal sinusoidal
basis vectors.

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M