Dear all,
Thanks for all your replies but I am still a bit confused. Let me tr
rephrasing my question so that everyone gets a better picture of what I a
trying to do. I have a large number (order of 10^13) of discretized dat
samples. I have to obtain the frequency spectrum of this data so that
can predict minima in the discretized data. I am using the FFTW C librar
to obtain the Frequency spectrum. But the maximum size of my data array i
limited to 2^32 in C. So I am picking a smaller subset of 1M samples fro
the original set of 10^13 data points.
Jerry is right about the fact that what I am looking for is subsamplin
or decimation without running into aliasing problems. But due to the larg
amount of data, I cannot perform decimation on the entire data set. Also
I'd like to point out that my data is inherently discretized and so usin
an analog anti-aliasing filter to remove all the high frequency component
that I am not interested in is NOT an option for me. So some scheme tha
resembles what John mentioned is of great interest to me. But I am no
sure if that will work. An ideal scheme for me would be like th
following:
1. Collect N=1M equidistant samples out of the n=10^13 data points.
2. Filter out the high frequency components present in the samples
using some kind of LPF, i.e.
For each sample "s_i" in the 1M samples
i. Collect some constant number (say 10) samples before and
after the sample "s_i"
ii. Perform a Digital Low Pass Filtering operation on these
21 samples.
iii. Pick one of these 21 samples (say the median of the
21 samples) to replace "s_i"
Will the above procedure eliminate or atleast drastically reduc
aliasing (assuming of course that the original 10^13 data points wer
sampled without aliasing)? Can I somehow tell which is the lowes
frequency that was significantly filtered out (so that I know which of th
lower frequencies contain aliases)?
Some other basic questions regarding interpretation of the N-pt FF
results :
1. The frequency bin "i" corresponds to frequency fi = (i*fs/N)
right?
So bin N/2 contains frequency fs/2 while bin 1 contains
frequency fs/N .
2. Assuming that the phase angles lie in [0, 2pi], if the phase angl
of a frequency is 0, does that mean that the first positive peak
of that frequency is at t=0?
(Is this the case for FFTW's FFT results?)
> But due to the large
> amount of data, I cannot perform decimation on the entire data set.
Make believe you are doing a DSP in real time. Read your data one point at
a time, do the appropriate convolution for filtering. Maintain enough of a
data buffer to do the convolution, letting the older stuff fall off the
end, but save every Nth point to a results array.
> Thanks for all your replies but I am still a bit confused. Let me try
> rephrasing my question so that everyone gets a better picture of what I am
> trying to do. I have a large number (order of 10^13) of discretized data
> samples. I have to obtain the frequency spectrum of this data so that I
> can predict minima in the discretized data. I am using the FFTW C library
> to obtain the Frequency spectrum. But the maximum size of my data array is
> limited to 2^32 in C. So I am picking a smaller subset of 1M samples from
> the original set of 10^13 data points.
If you only need part of the FFT, such as the low frequency points,
there are ways to speed it up. But 1e13 is a lot of points.
I have wondered in the past about an FFT on an entire CD, which is
much less than 1e13 samples. There are many tricks that can be
done with large data sets to process them faster.
glen herrmannsfeldt wrote:
> prad wrote:
>
>> Thanks for all your replies but I am still a bit confused. Let me try
>> rephrasing my question so that everyone gets a better picture of what
>> I am
>> trying to do. I have a large number (order of 10^13) of discretized data
>> samples. I have to obtain the frequency spectrum of this data so that I
>> can predict minima in the discretized data. I am using the FFTW C library
>> to obtain the Frequency spectrum. But the maximum size of my data
>> array is
>> limited to 2^32 in C. So I am picking a smaller subset of 1M samples from
>> the original set of 10^13 data points.
>
> If you only need part of the FFT, such as the low frequency points,
> there are ways to speed it up. But 1e13 is a lot of points.
> I have wondered in the past about an FFT on an entire CD, which is
> much less than 1e13 samples. There are many tricks that can be
> done with large data sets to process them faster.
Do we know that speed is an issue? Multirate decimation from and to
files on disk is surely a possibility for a one-off calculation. The
numbers don't add up for me. 2^32 = 4,294,967,296. Why mess around with
2^20 (1,048,576)? Decimate by 2^12 and go. (I don't know the highest
original frequency, but the highest one left after decimation will be
f/4096. Unless those minima are very broad, the processing will fill
them in.)
Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
If you are interested in the whole spectrum, you are stuck with random
sampling with significantly more than 1E6 points. If you are only interested
in a "small" of the spectrum, you might try filtering out only the band of
interest and transforming that.
As an aside, there are 64 bit computers available for reasonable amounts of
money. Maybe a lease ...
In article <[email protected]>, [email protected] wrote:
>glen herrmannsfeldt wrote:
>> prad wrote:
>>
>>> Thanks for all your replies but I am still a bit confused. Let me try
>>> rephrasing my question so that everyone gets a better picture of what
>>> I am
>>> trying to do. I have a large number (order of 10^13) of discretized data
>>> samples. I have to obtain the frequency spectrum of this data so that I
>>> can predict minima in the discretized data. I am using the FFTW C library
>>> to obtain the Frequency spectrum. But the maximum size of my data
>>> array is
>>> limited to 2^32 in C. So I am picking a smaller subset of 1M samples from
>>> the original set of 10^13 data points.
>>
>> If you only need part of the FFT, such as the low frequency points,
>> there are ways to speed it up. But 1e13 is a lot of points.
>> I have wondered in the past about an FFT on an entire CD, which is
>> much less than 1e13 samples. There are many tricks that can be
>> done with large data sets to process them faster.
>
>Do we know that speed is an issue? Multirate decimation from and to
>files on disk is surely a possibility for a one-off calculation. The
>numbers don't add up for me. 2^32 = 4,294,967,296. Why mess around with
>2^20 (1,048,576)? Decimate by 2^12 and go. (I don't know the highest
>original frequency, but the highest one left after decimation will be
>f/4096. Unless those minima are very broad, the processing will fill
>them in.)
>
>Jerry