This is a bit OT with respect to DSP, but I know there are
many skilled programmers here so maybe somebody can
help me out.
I am implementing these classes that basically look like
class a {
:
};
class b{
private:
a *p;
int i;
public:
:
};
Basically, class b is an array of objects of class a,
with some extra functionality attached.
I would like to be able to access an instance of class
b as if it really was an array of a's:
a A; // An instance of class a
b B; // An "augmented array of as"
a C[4]; // A "true" array of as
With the above declarations, both
A = C[1];
C[0] = A;
are legal.
I want to be able to do similar things as
A = B[1];
B[0] = A;
where what one actually does is
A = *(B.p+1);
*(B.p+1) = A;
In order to achieve this, the operators "=" and "[]"
must be overloaded, which is where my efforts grind
to a halt... It does not help my efforts that I am
way out to sea with no prospect of getting back
to my C++ books for another three weeks...
Rune Allnor wrote:
> Basically, class b is an array of objects of class a,
> with some extra functionality attached.
>
> I would like to be able to access an instance of class
> b as if it really was an array of a's:
>
> a A; // An instance of class a
> b B; // An "augmented array of as"
> a C[4]; // A "true" array of as
>
> With the above declarations, both
>
> A = C[1];
> C[0] = A;
>
> are legal.
>
> I want to be able to do similar things as
>
> A = B[1];
> B[0] = A;
>
> where what one actually does is
>
> A = *(B.p+1);
> *(B.p+1) = A;
>
> In order to achieve this, the operators "=" and "[]"
> must be overloaded, which is where my efforts grind
> to a halt... It does not help my efforts that I am
> way out to sea with no prospect of getting back
> to my C++ books for another three weeks...
You may get most of your expressions working if you do enough
investigations, but in fact it is much easier to use STL classes which
are part of the C++ standard.
The expression *(B.p+1) is uncommon. Usually this is written *(B.begin()+1).
The class vector<a> will most likely meet your demands.
(Use #include <vector> or #include <vector.h> on older compilers.)
Since the STL classes are templates, it is usually sufficient to get the
STL header files if your compiler does not support it out of the box.
SGI has a free implementation and a good reference to it at www.sgi.com/tech/stl.
> class a {
int member;
public:
a& operator=(const a& other)
{
member = other.member;
return *this;
}
> };
>
> class b{
> private:
> a *p;
> int i;
> public:
a& operator[](std::size_t n)
{
assert(n < i); // if that's what i means
return p[n];
}
The b:perator[] const overload is necessary so you can use
subscript expressions in functions that take const b&.
There's a chance that you don't need to write a:perator= yourself
-- the autogenerated operator simply assigns all base subobjects and
members. However, if you need exception safety then use the copy/swap
idiom; see http://groups.google.com/groups?q=c%...22copy+swap%22
Martin
--
There are three kinds of people --
those who can count and those who can't.
Martin Eisenberg wrote:
> Rune Allnor wrote:
>
> > It does not help my efforts that I am way out to sea with no
> > prospect of getting back to my C++ books for another three weeks...
>
> When the prophet cannot go to the scripture, let the scripture come
> to the prophet
Thanks for the quick response.
With your help along with the fresh weather (50 kts wind and ~8 m
waves right now), the necessary details were shaken loose from
the darker corners of my mind...
Glad it helped. Don't discount Marcel's comment, though -- whenenver
you find yourself newing arrays, you're going to have a hard time
defending that against displacement by a std::vector instance.
Martin
--
When he had finally achieved a position that
allowed him to say everything he thought, he
only thought of his position anymore.
--Gabriel Laub
Martin Eisenberg skrev:
> Rune Allnor wrote:
>
> > Thanks for the quick response.
>
> Glad it helped. Don't discount Marcel's comment, though -- whenenver
> you find yourself newing arrays, you're going to have a hard time
> defending that against displacement by a std::vector instance.
Why?
Is this a matter of programming style or do you mean that the
memory allocated to the naive pointer can be taken over by
some routine in the STL?
Just as and aside: I did something similar ten years ago,
way before complex numbers were part of the STL. I basically
implemented a class for complex numbers from scratch,
using all these tricks with operator overloading etc. It turned
out that the program run time decreased by some 95% after
I used the INLINE directive for these trivial functions. I didn't
start using the STL::complex<> class until I had reassured
myself that everything iside was inline'd.
This present application will also be a CPU-intensive thing,
so speed is definately a priority.
>> Don't discount Marcel's comment, though -- whenenver you find
>> yourself newing arrays, you're going to have a hard time defending
>> that against displacement by a std::vector instance.
>
> Why?
Vectors are easy to use well. Explicit memory management is
notoriously easy to mess up. The more of it you can sweep under the
rug, the less effort it takes to write quality code. That's one of
the big reasons the standard library has container classes.
Regarding speed, if you trust your compiler to inline the operators
of the class you called b in this thread then it's reasonable to
expect the same for stdlib components. Then there's the fact that
heavy template use may inflate compilation time, but if including
<vector> actually has a measurable effect it's well worth it to me
personally to avoid the cursing and agony in memory debugging. And
all of this assumes you're on a hosted C++ implementation, otherwise
you wouldn't have std::vector available in the first place.
> Is this a matter of programming style or do you mean that the
> memory allocated to the naive pointer can be taken over by
> some routine in the STL?
Sorry, I hope it's clear now that I meant "defend" as in "justify",
not "fight off".
> Just as and aside: I did something similar ten years ago,
> way before complex numbers were part of the STL.
Nitpick: Nowadays, STL is just a colloquial term for a subset of the
standard library comprising sequences and algorithms so std::complex
is still not included.
> I basically implemented a class for complex numbers from scratch,
> using all these tricks with operator overloading etc. It turned
> out that the program run time decreased by some 95% after
> I used the INLINE directive for these trivial functions.
What an optimizer will expand inline and how much prodding it needs
are compiler quality-of-implementation questions, not library
questions. But note that the way template classes are usually written
their members are inline by default, making *too much* inlining a QoI
issue as well
Martin
--
I am an Indian and looked upon by the whites as a foolish man;
but it must be because I follow the advice of the white man.
--Shunka Witko
Martin Eisenberg skrev:
> Rune Allnor wrote:
> > Martin Eisenberg skrev:
>
> >> Don't discount Marcel's comment, though -- whenenver you find
> >> yourself newing arrays, you're going to have a hard time defending
> >> that against displacement by a std::vector instance.
> >
> > Why?
>
> Vectors are easy to use well. Explicit memory management is
> notoriously easy to mess up. The more of it you can sweep under the
> rug, the less effort it takes to write quality code. That's one of
> the big reasons the standard library has container classes.
Maybe I am paranoid after having used matlab for too long, but
I would like to have as much control as possible.
> Regarding speed, if you trust your compiler to inline the operators
> of the class you called b in this thread then it's reasonable to
> expect the same for stdlib components. Then there's the fact that
> heavy template use may inflate compilation time,
Compilation time is -- as of yet -- a non-issue here. My stuff takes
seconds to compile, with my naive style of code (no templates).
> but if including
> <vector> actually has a measurable effect it's well worth it to me
> personally to avoid the cursing and agony in memory debugging. And
> all of this assumes you're on a hosted C++ implementation, otherwise
> you wouldn't have std::vector available in the first place.
That's the other reason why I prefer to implement my own stuff.
I have moved around quite a bit in the past, between sites with
variable extensions to the "standards". My code is mine and
available to me where I am. Saves a lot of grief when getting to
a new place.
> > Is this a matter of programming style or do you mean that the
> > memory allocated to the naive pointer can be taken over by
> > some routine in the STL?
>
> Sorry, I hope it's clear now that I meant "defend" as in "justify",
> not "fight off".
Good. You had me worried there...
> > Just as and aside: I did something similar ten years ago,
> > way before complex numbers were part of the STL.
>
> Nitpick: Nowadays, STL is just a colloquial term for a subset of the
> standard library comprising sequences and algorithms so std::complex
> is still not included.
I think STL means "standard template library"? There was a
complex<T> class included, that I have started using.
> > I basically implemented a class for complex numbers from scratch,
> > using all these tricks with operator overloading etc. It turned
> > out that the program run time decreased by some 95% after
> > I used the INLINE directive for these trivial functions.
>
> What an optimizer will expand inline and how much prodding it needs
> are compiler quality-of-implementation questions, not library
> questions.
A compiler can not do inlining without access to the source code.
A library that does not provide that soorce code can, by definition,
not be inlined. So this is very much a library question.
> But note that the way template classes are usually written
> their members are inline by default, making *too much* inlining a QoI
> issue as well
Inlining is not something I use very often. It is, however, utterly
crucial in the complex class. Check this out:
let's see what happens inside an utterly trivial
expresion like
a = b+c[2]; // (*)
The expression is split in a number of function calls:
c[2] : c.operator[](2)
b+cc: operator+(b,cc)
a = bb: a.operator=(bb)
and that's not even counting function-call access to the real
and imaginary values of the numbers involved. One can easily
assume at least one of each of the get... and set... methods
inside each of the three operators, making the seemingly trivial
expression (*) contain at least 15 -- that's right, fifteen! -- formal
function calls.
It would cmopletely kill the performance of any arithmetic-
intensive program if these formal function calls should be
implemented as "physical" function calls.
Rune Allnor wrote:
>> but if including
>> <vector> actually has a measurable effect it's well worth it to me
>> personally to avoid the cursing and agony in memory debugging. And
>> all of this assumes you're on a hosted C++ implementation, otherwise
>> you wouldn't have std::vector available in the first place.
>
> That's the other reason why I prefer to implement my own stuff.
> I have moved around quite a bit in the past, between sites with
> variable extensions to the "standards". My code is mine and
> available to me where I am. Saves a lot of grief when getting to
> a new place.
Well, as long as we ar talking about trivial things like a vector you
might be right. However, you usually should not expext your code to be
exeption-safe for instance.
And you may be suprised how much pitfalls are waiting for you if you try
to implement something like smart pointers for object lifetime management.
> Inlining is not something I use very often. It is, however, utterly
> crucial in the complex class.
It is essential in most C++ template classes as they have quite often
functions that do not implement any business logic but they are required
for formal reasons.
Marcel Müller skrev:
> Rune Allnor wrote:
> >> but if including
> >> <vector> actually has a measurable effect it's well worth it to me
> >> personally to avoid the cursing and agony in memory debugging. And
> >> all of this assumes you're on a hosted C++ implementation, otherwise
> >> you wouldn't have std::vector available in the first place.
> >
> > That's the other reason why I prefer to implement my own stuff.
> > I have moved around quite a bit in the past, between sites with
> > variable extensions to the "standards". My code is mine and
> > available to me where I am. Saves a lot of grief when getting to
> > a new place.
>
> Well, as long as we ar talking about trivial things like a vector you
> might be right. However, you usually should not expext your code to be
> exeption-safe for instance.
My intentions right now are merely to do with C++ what I can not
achieve with matlab. The current project is a bathymetry model
(basically a terrain model of the underwater environment) where
I want to test a couple of ideas, that can not be done in matlab.
It has a lot to do with large amounts of data, and a little bit
to do with how to process the bathymetry map.
If the ideas work out, I'll have to get somebody who know how
to program "for real" to help me implement a production version.
> And you may be suprised how much pitfalls are waiting for you if you try
> to implement something like smart pointers for object lifetime management.
Like...?
> > Inlining is not something I use very often. It is, however, utterly
> > crucial in the complex class.
>
> It is essential in most C++ template classes as they have quite often
> functions that do not implement any business logic but they are required
> for formal reasons.
Well, I haven't spent too much time learning the STL. As I said,
what I do with C++ is done because I can't do it with matlab,
or a matlab implementation is too cumbnersome to maintain.
With the bathymetry map, my main focus right now is to
find out the importance of using the correct data structure
for internal representations. I am not too well versed in
data structures, so a STL overlay might just obfuscate
my understanding of what is going on, more than help.
Right now, the main focus is to have total control -- for better
or for worse -- of the internals of my programs.
>> Vectors are easy to use well. Explicit memory management is
>> notoriously easy to mess up. The more of it you can sweep under
>> the rug, the less effort it takes to write quality code. That's
>> one of the big reasons the standard library has container
>> classes.
>
> Maybe I am paranoid after having used matlab for too long, but
> I would like to have as much control as possible.
I'd be interested in the specific things you want control over and
how you achieve them with new.
>> What an optimizer will expand inline and how much prodding it
>> needs are compiler quality-of-implementation questions, not
>> library questions.
>
> A compiler can not do inlining without access to the source
> code. A library that does not provide that soorce code can, by
> definition, not be inlined. So this is very much a library
> question.
It is possible to put the necessary information in the intermediate
files the linker consumes. MSVC.net and Intel C++ can do link-time
optimization and I think the GCC team is working on it, but I must
concede it's not yet widespread enough to simply rely on it.
Martin
--
Is it any wonder the world's gone insane,
with information come to be the only real
medium of exchange? --Thomas Pynchon
Martin Eisenberg skrev:
> Rune Allnor wrote:
> > Martin Eisenberg skrev:
>
> >> Vectors are easy to use well. Explicit memory management is
> >> notoriously easy to mess up. The more of it you can sweep under
> >> the rug, the less effort it takes to write quality code. That's
> >> one of the big reasons the standard library has container
> >> classes.
> >
> > Maybe I am paranoid after having used matlab for too long, but
> > I would like to have as much control as possible.
>
> I'd be interested in the specific things you want control over and
> how you achieve them with new.
Computational overhead in the classes is one aspect, memory
usage is another. The application I have in mind will spend a lot
of memory, and I need to use it as efficiently as possible.
I can't afford to use call-by-value functions that pass the full
data set in every function call. Nor can I spend 8 bytes
of memory for each and every variable. Add to that a couple
of dynamic data structures, and you have yourself a C++
program.
The alternative to C++ is matlab, which will spend 1000%
more memory, and 50 times longer on the processing.
The alternative to 'new' is static memory allocation, which
died quietly along with fortran 77 some 15 years ago.
Dmitry Utyansky skrev:
> Rune Allnor wrote:
>
> > The alternative to 'new' is static memory allocation, which
> > died quietly along with fortran 77 some 15 years ago.
>
> I'd say this is an overstatement, esp. if high performance/low overhead
> is the priority... (not sure about fortran though)
Pre-F90 fortran code relied on statically allocated memory,
i.e. the programmer hard-coded array sizes in the source code.
This means that the compiler at any time knows the exact
sizes and positions of all variables, and thus easily can
optimize the code. The downside is that the code has to
be re-compiled once somebody wants to use more data
than the program designer allowed for.
Dynamic memory allocation allows for a more flexible
use of memory, at the cost of a somewhat slower execution.
Alternatively, the programmer must be a lot more clever
with the program design in order to achieve comparable
run-time.
Rune Allnor wrote:
> Dmitry Utyansky skrev:
> >
> > > The alternative to 'new' is static memory allocation, which
> > > died quietly along with fortran 77 some 15 years ago.
> >
> > I'd say this is an overstatement, esp. if high performance/low overhead
> > is the priority... (not sure about fortran though)
>
> Pre-F90 fortran code relied on statically allocated memory,
> i.e. the programmer hard-coded array sizes in the source code.
>
> This means that the compiler at any time knows the exact
> sizes and positions of all variables, and thus easily can
> optimize the code. The downside is that the code has to
> be re-compiled once somebody wants to use more data
> than the program designer allowed for.
>
> Dynamic memory allocation allows for a more flexible
> use of memory, at the cost of a somewhat slower execution.
> Alternatively, the programmer must be a lot more clever
> with the program design in order to achieve comparable
> run-time.
Yeah. Or at a cost of non-feasible implementation at all.
Sorry, my point was that certainly one can allocate data statically in
C/C++
if one wishes so (of if there's actually no other way, e.g. if we're
dealing with lots of audio/video data or high performance when
processing strings/whatever is a priority. Even now in post-Fortran77
times)...
BTW (closer to the topic), I recall implementing something like "C++
strings" library a few years back, in which we taken explicit measures
so that small strings data (below some threshold) were placed within
the c++ objects (i.e., without additional dynamic data allocation for
varialble-length string data). Since the [] op. was overloaded anyway
this allocation details were completely transparent to the application.
Dmitry Utyansky skrev:
> Rune Allnor wrote:
> > Dmitry Utyansky skrev:
> > >
> > > > The alternative to 'new' is static memory allocation, which
> > > > died quietly along with fortran 77 some 15 years ago.
> > >
> > > I'd say this is an overstatement, esp. if high performance/low overhead
> > > is the priority... (not sure about fortran though)
> >
> > Pre-F90 fortran code relied on statically allocated memory,
> > i.e. the programmer hard-coded array sizes in the source code.
> >
> > This means that the compiler at any time knows the exact
> > sizes and positions of all variables, and thus easily can
> > optimize the code. The downside is that the code has to
> > be re-compiled once somebody wants to use more data
> > than the program designer allowed for.
> >
> > Dynamic memory allocation allows for a more flexible
> > use of memory, at the cost of a somewhat slower execution.
> > Alternatively, the programmer must be a lot more clever
> > with the program design in order to achieve comparable
> > run-time.
>
> Yeah. Or at a cost of non-feasible implementation at all.
I can't really see that one can do anything in fortran that one can
not do in C or C++? The converse, on the other hand...
> Sorry, my point was that certainly one can allocate data statically in
> C/C++
> if one wishes so (of if there's actually no other way, e.g. if we're
> dealing with lots of audio/video data or high performance when
> processing strings/whatever is a priority. Even now in post-Fortran77
> times)...
OK, there may still be some applications around that require static
memory allocation, in order to work. There can't be many, though,
and those that remain would be very specialized.
> BTW (closer to the topic), I recall implementing something like "C++
> strings" library a few years back, in which we taken explicit measures
> so that small strings data (below some threshold) were placed within
> the c++ objects (i.e., without additional dynamic data allocation for
> varialble-length string data). Since the [] op. was overloaded anyway
> this allocation details were completely transparent to the application.
Yes, it's a cost-benefit analysis. I try to do those sorts of things
all
the time, to the extent my 5 years old, very-basic-version compiler/IDE
lets me analyze my code. However, the current application is intensive
on arithmetic, and there just isn't enough leeway to very clever
tricks.
In this application my plan -- I haven't really got that far yet -- is
to use estimate the space needed by inspecting the data files
before doing anything, and then use one big dynamically allocated
buffer to hold all my objects of a certain class. I'll then use class
pointers as indexes into that list. The semantics of the code works
as if every single object is allocated with 'new', at the same time I
avoid the overhead of allocating every single object.
True, it's a very naive approach, but whatever pitfalls I stumble
across by doing this provide ample learning opportunities.
Rune Allnor wrote:
....
> > > Dynamic memory allocation allows for a more flexible
> > > use of memory, at the cost of a somewhat slower execution.
> > > Alternatively, the programmer must be a lot more clever
> > > with the program design in order to achieve comparable
> > > run-time.
> >
> > Yeah. Or at a cost of non-feasible implementation at all.
>
> I can't really see that one can do anything in fortran that one can
> not do in C or C++? The converse, on the other hand...
No, I meant that there are things that are just not feasible if you use
dynamic allocation, no matter how many "clever tricks" are used...
> OK, there may still be some applications around that require static
> memory allocation, in order to work. There can't be many, though,
> and those that remain would be very specialized.
Actually most DSP applications I work with (audio/video) use static
allocation for the processing. Video in particular usually is somewhere
near the boundary of processor capabilities (and in this case one has
to deal not only with "static allocation", but with various "proper
alignments" of data structures, banks assignment, cache structures
alignment (if any), similar exotic things).
I would usually prefer static allocation, unless sound arguments exist
in favor of dynamic, e.g., if we have to really share limited memory
resources, between several subsystems. But in this case you will have
to answer the question "what will the system do if all subsystems
require all the memory?". If you have to have just enough memory for
this worst case -- why don't just statically allocate this memory and
forget about the issue?)
"General" dynamic allocation (1) takes more time for malloc/free &
[usually] pointers access overhead; (2) complicates control over where
& what is placed; (3) introduces additional variability into execution
times, which is sometimes bad; (4) complicates analysis of the system
(creating "not enough memory" situtations and other non-trivial "no
resource" problems); (5) creates more possibilities for the programmers
to err (forgetting to free memory, wrong pointers, staff like this),
especially for more complex systems.
>
> Yes, it's a cost-benefit analysis. I try to do those sorts of things
> all
> the time, to the extent my 5 years old, very-basic-version compiler/IDE
>
> lets me analyze my code. However, the current application is intensive
> on arithmetic, and there just isn't enough leeway to very clever
> tricks.
My favorite is BorlandC 3.1++ Too bad it is only 16-bit. But it is
great for 16-bit fixed-point code prototyping, and it works _fast_ on a
modern PC...
> In this application my plan -- I haven't really got that far yet -- is
> to use estimate the space needed by inspecting the data files
> before doing anything, and then use one big dynamically allocated
> buffer to hold all my objects of a certain class.
See -- this is essentially static allocation, with some initial set-up
to allocate "one big buffer"...
> I'll then use class
> pointers as indexes into that list. The semantics of the code works
> as if every single object is allocated with 'new', at the same time I
> avoid the overhead of allocating every single object.
Actually if you use operator[] overloading, you do not essentially have
to have "class pointers". Inside the operator[] implementation you can
write whatever you like -- array indexing, individual object access,
dynamic/static, whatever -- as long as you return a reference to the
object (like, for a "character" case,
char &string:perator[](unsigned index) ) -- you will be able to read
or write the array elements using []-indexing, like s[i]=10 or a=s[i].
Dmitry Utyansky skrev:
> Rune Allnor wrote:
> ...
> > > > Dynamic memory allocation allows for a more flexible
> > > > use of memory, at the cost of a somewhat slower execution.
> > > > Alternatively, the programmer must be a lot more clever
> > > > with the program design in order to achieve comparable
> > > > run-time.
> > >
> > > Yeah. Or at a cost of non-feasible implementation at all.
> >
> > I can't really see that one can do anything in fortran that one can
> > not do in C or C++? The converse, on the other hand...
>
> No, I meant that there are things that are just not feasible if you use
> dynamic allocation, no matter how many "clever tricks" are used...
Define "feasible", then. As you point out below -- and I already
have mentioned -- dynamic memory allocation may introduce
enough run-time overhead to overstep the limits of what you
can achieve with any given system, but that does not mean that
you can't do the job with a faster system. There are stuff that
can be done with C++ that can't be done with pre-F90 fortran[*],
no matter what system one runs on. It has to do with dynamic
memory allocation, flexible data types and object orientation.
We agree that static memory allocation, when applicable, generally
makes for faster executables, though.
> > OK, there may still be some applications around that require static
> > memory allocation, in order to work. There can't be many, though,
> > and those that remain would be very specialized.
>
> Actually most DSP applications I work with (audio/video) use static
> allocation for the processing. Video in particular usually is somewhere
> near the boundary of processor capabilities (and in this case one has
> to deal not only with "static allocation", but with various "proper
> alignments" of data structures, banks assignment, cache structures
> alignment (if any), similar exotic things).
These are the usual issues of real-time processing. I can see that
these are big issues in DSP type applications like audio and video.
While there may be many units out there where such programs are
used, I still maintain this is rather exotic from a programming point
of view. I don't think you will find many programmers outside the
DSP/embedded device business who prefer static memory
allocation over dynamic, if given the choise.
> I would usually prefer static allocation, unless sound arguments exist
> in favor of dynamic,
Try to do some real work in academia or R&D institutions.
Most code in such places is ancient F77 with everything
hard-coded. My impression is that people in the '70s and
'80s worked on arrays of length 10, and thought 100 was a
"big" number and very generous overhead allocation. These
days people work routinely with arrays of the orders of 1000s
and 10,000s [**].
Maintaining such ancient code is a nightmare.
> e.g., if we have to really share limited memory
> resources, between several subsystems. But in this case you will have
> to answer the question "what will the system do if all subsystems
> require all the memory?".
This problem will be postponed if all systems make the effort to
minimize their memory footprint.
> If you have to have just enough memory for
> this worst case -- why don't just statically allocate this memory and
> forget about the issue?)
Because what you consider "worst case" today may be a joke
tomorrow. Remember MSDOS? 1MB memory addressing space
organized in a way that could not easily be expaned. This single
issue set the PC, and the computer industry with it, back a couple
of decades with respect to the state of the art at the time. Some 90%
of any work PC programmers did up to the late '90s, was all about
handling that totally ridiculous memory issue.
There might come a day when you -- or worse, somebody else
who use your code -- have to do a job where the data don't fit into
your static data structure.
> "General" dynamic allocation (1) takes more time for malloc/free &
> [usually] pointers access overhead; (2) complicates control over where
> & what is placed; (3) introduces additional variability into execution
> times, which is sometimes bad; (4) complicates analysis of the system
> (creating "not enough memory" situtations and other non-trivial "no
> resource" problems); (5) creates more possibilities for the programmers
> to err (forgetting to free memory, wrong pointers, staff like this),
> especially for more complex systems.
Item (1) is the main factor to be aware of, what run-time is
concerned, and is the factor I aim to reduce by pre-allocating
one big buffer. Items (2) and (3) are big issues in real-time DSP
applications, but are no problem in the application I work with.
As for item (4), I disagree. A good processing system checks
for what is needed, and raises an alarm if not enough resources
are available. Otherwise, it proceeds and allocates just enough
memory to do the job, leaving enough space for other applications
to work as well. Item (5), well, that's life.
No one said computer programming was easy, despite gazillions
of books of type "computers for dummies".
> > In this application my plan -- I haven't really got that far yet -- is
> > to use estimate the space needed by inspecting the data files
> > before doing anything, and then use one big dynamically allocated
> > buffer to hold all my objects of a certain class.
>
> See -- this is essentially static allocation, with some initial set-up
> to allocate "one big buffer"...
No, it's not. It's a dynamic memory allocation. I check what is
needed, I don't impose any of my own prejudice or opinions of
type "this much memory ought to be sufficient" or "I grab
everything that is available just to make sure I get what I need".
If the computer grinds to a halt, well, then the user knows that
he exceeded the swap disk limit, and should stop the program
to try again with a smaller data set. It is my task as programmer
to make that possible, if needed. Next, I check if enough resources
are available, and I don't allocate more memory than I need. At the
same time, I aim at reducing some of the overhead we agree to
be present. Not all the overhead. The most costly bits, what
run-time is concerned.
These may seem as trivial details to the untrained eye, but such
factors do make a significant difference.
> > I'll then use class
> > pointers as indexes into that list. The semantics of the code works
> > as if every single object is allocated with 'new', at the same time I
> > avoid the overhead of allocating every single object.
>
> Actually if you use operator[] overloading, you do not essentially have
> to have "class pointers". Inside the operator[] implementation you can
> write whatever you like -- array indexing, individual object access,
> dynamic/static, whatever -- as long as you return a reference to the
> object (like, for a "character" case,
> char &string:perator[](unsigned index) ) -- you will be able to read
> or write the array elements using []-indexing, like s[i]=10 or a=s[i].
I know. I *could* do it like that, and I will do some of those things
in order to keep track of certain aspects of book-keeping. But if I
based
the entire application on this approach, I would have lept back some
30 years to the days of F77. My plan is to get this data structure to
work as if it was entirely dynamic.
It remains to be seen how well this works.
Rune
[*] I haven't really been involved with fortran since 1996-97 or so,
when F90 became widesperad. The tendency at the time was
that F90 approached C++ in style and functionality. Some even
said that F90 merely was a code preparser on top of a C++
compiler. I don't know if that was true, though, nor do I know
the state of fortran today.
[**] Don't take the numbers too literally. The example
is intended to describe the differences of scale in the
programmer's mind back then and the reality today.
Rune Allnor wrote:
> Dmitry Utyansky skrev:
> > > >
> > > > Yeah. Or at a cost of non-feasible implementation at all.
> > >
> > > I can't really see that one can do anything in fortran that one can
> > > not do in C or C++? The converse, on the other hand...
> >
> > No, I meant that there are things that are just not feasible if you use
> > dynamic allocation, no matter how many "clever tricks" are used...
>
> Define "feasible", then.
Impossible to implement given the available resources, on prohibitively
costly.
Like, if you know that if you will be allocating this 1Mb buffer from
the common dynamic pool, you could face "no memory" situation even
though there's enough memory in the system, because of the accumulated
fragmentation (too bad if this issue can be reproduced only after a
couple of weeks of uninterrupted functioning of the device, at the
customer site, on the other side of Earth).
Or if you just do not have enough MHz or RAM for the overheads (or
installing it will make your $20 hardware cost $200, and you will have
to buy all the batch yourself, to decorate the house )
Sure, for many tasks installing more Mbs of RAM or a N-GHz Pentium will
solve the problem (or, rather, delay its appearance), but in many
applications this is just not feasible.
>There are stuff that
> can be done with C++ that can't be done with pre-F90 fortran[*],
> no matter what system one runs on. It has to do with dynamic
> memory allocation, flexible data types and object orientation.
Actually I am not sure this is a valid statement from an "abstract
computational theory" point of view. I do not know much about F90; from
what I recall about older Fortran in general, it lacks dynamic
allocation/pointers and pointers to procedures, though regarding
"procedures" I think it has some ~close mechanism for passing
references to them. I would not argue here, since it's quite some time
since I used fortran and I am absolutely not qualified here.
In general though, all these "flexible data types" map into CPU command
set (for most processors quite typeless).
Of course I agree with you on the point of "expressive power" of
languages and the ease of implementation of certain things, and
readability from a human point of view.
> We agree that static memory allocation, when applicable, generally
> makes for faster executables, though.
Actually my main point was more control & the ease of overall system
analysis, not faster executables.
> > > OK, there may still be some applications around that require static
> > > memory allocation, in order to work. There can't be many, though,
> > > and those that remain would be very specialized.
We are just looking at this from different backgrounds. You think they
are "very specialized/exotic", I think they are not (obviously because
of my current activities).
You must be correct stating that this "embedded" domain is smaller than
the "general purpose" computing. Some ideas are pretty universal
though.
Of course I agree with you that if you're creating a piece of software
for a "stand-alone/general purpose" computer, especially with a chance
of this piece of software surviving several generations of computers/OS
-- in many cases it's more convenient & straightforward to use dynamic
allocation, certainly if your data sets can vary in size, etc.
> Try to do some real work in academia or R&D institutions.
> Most code in such places is ancient F77 with everything
> hard-coded. My impression is that people in the '70s and
> '80s worked on arrays of length 10, and thought 100 was a
> "big" number and very generous overhead allocation. These
> days people work routinely with arrays of the orders of 1000s
> and 10,000s [**].
I used to. Certainly I would completely agree with you here.
BTW, when I write such software (PC utilities, prototyping, internal
use, etc.) -- usually I redefine new() operator so that it aborts if
there's lack of memory. So that I do not have to worry about it in the
rest of application -- I know that a call to new()/malloc() always
returns memory. This approach clearly shows the error, allows to
understand/fix it, and does not allow the software to get to some
undefined state (especially true if the system does not have memory
protection).
This is IMHO the best controlled behavior for many purposes, especially
for "R&D", and/or for most command-line batch applications -- if this
situation is really rare, and occurs only because of internal errors
(which have to be fixed anyway) or because of incorrect user input in
command line (you'll have to abort the application anyway to correct
the input).
In many cases of more "user friendly" GUIs this is not a correct
behavior.
In most cases of embedded software -- this is completely unacceptable
(if this situation can really happen in normal life). If you press a
button -- the device should perform a certain function, not "wait until
resources are available" or display some "pop up" telling you how sorry
the device is.
> This problem will be postponed if all systems make the effort to
> minimize their memory footprint.
Still, you have to really analyze all the possible cases. It can get
really complicated.
In most cases it is _way_ simpler to understand all the issues going on
if you can understand the upper boundary on RAM consumption for each
subsystem and allocate this amount to it (different methods -- working
sets, static allocation, etc. -- same general idea). Like,
decomposition of one complex fenomenon into several less complex ones.
Taking into account interactions between subsystems (like, if s1
requires 100 kb -- then I am sure that s2 will require only 2kb, and
vice versa) -- is certainly a way to minimize s1+s2 memory consumption
by sharing memory, but if you have a number of subsystems, you couldn't
analyse all of their mutual states -- it's exponential.
> Because what you consider "worst case" today may be a joke
> tomorrow. Remember MSDOS?
Again, I do not argue about "general computer program". For the
embedded system -- you have the hardware that you have. And the
software will have exactly that hardware forever in most cases (well,
until it breaks down). The computing environment is completely
controllable in most cases. (the oppisite, on the other hand, greatly
complicates things for the "end-user PC software", especially if it is
related to some specialized hardware -- drivers, video acquisition,
etc. -- since usually you do not know in what environment your
application will work. I have not seen yet a reliably working internal
PC TV tuner -- but this is a different story).
> There might come a day when you -- or worse, somebody else
> who use your code -- have to do a job where the data don't fit into
> your static data structure.
Well, I do not particularly disagree or argue here. I can only say that
the code should be well-documented, maintainable, etc. And obviously
where dynamic allocation fits, it should be used, it's not religious at
all. With caution
> > "General" dynamic allocation (1) takes more time for malloc/free &
> > [usually] pointers access overhead; (2) complicates control over where
> > & what is placed; (3) introduces additional variability into execution
> > times, which is sometimes bad; (4) complicates analysis of the system
> > (creating "not enough memory" situtations and other non-trivial "no
> > resource" problems); (5) creates more possibilities for the programmers
> > to err (forgetting to free memory, wrong pointers, staff like this),
> > especially for more complex systems.
>
> Item (1) is the main factor to be aware of, what run-time is
> concerned, and is the factor I aim to reduce by pre-allocating
> one big buffer. Items (2) and (3) are big issues in real-time DSP
> applications, but are no problem in the application I work with.
> As for item (4), I disagree. A good processing system checks
> for what is needed, and raises an alarm if not enough resources
> are available. Otherwise, it proceeds and allocates just enough
> memory to do the job, leaving enough space for other applications
> to work as well.
This _is_ real complication. If something can be simplified -- it
should.
When you press a button, the system must not "raise alarm" or "wait for
something available", it should do the function assigned.
Every now and then I see in my "civilian life" examples -- I have some
satellite box that once in a month just hangs; too bad its power switch
is software-controlled (since somebody thought he could write a good
firmware), and it does not have any hardware watchdog, and I have to
use its power plug to reset the thing. And I had to remove the battery
from my current cell phone a couple of times for exactly same
purpose...
>Item (5), well, that's life.
If I know that it's not safe to stroll in some area at night -- I would
not generally go there, unless there is some serious motivation. And I
would not want to wear some expensive things there. That's just
reasonable precautions. While of course I can say "that's life" and go
there without much concern, and accept the 30% probability of getting
into some trouble...
> No one said computer programming was easy, despite gazillions
> of books of type "computers for dummies".
Yep.
Again, I think we just are seeing the situation from slightly different
points of view. I do not have much disagreement in general. I think I
just was "triggered" by the "specialized/very exotic" part of your
statement. Which is obviously not the case in many applications. And
I've just seen enough programmers how do not completely understand all
the related issues...
Rune Allnor wrote:
> Martin Eisenberg skrev:
>> Rune Allnor wrote:
>> > Martin Eisenberg skrev:
>>
>> >> Vectors are easy to use well. Explicit memory management is
>> >> notoriously easy to mess up. The more of it you can sweep under
>> >> the rug, the less effort it takes to write quality code. That's
>> >> one of the big reasons the standard library has container
>> >> classes.
>> >
>> > Maybe I am paranoid after having used matlab for too long, but
>> > I would like to have as much control as possible.
>>
>> I'd be interested in the specific things you want control over and
>> how you achieve them with new.
>
> Computational overhead in the classes is one aspect, memory
> usage is another.
Given that we've been talking about access speed before, do you mean
the overhead of allocation itself? That won't differ whether the
allocation is done in user or library code, for the same number and
sizes of buffers.
> I can't afford to use call-by-value functions that pass the full
> data set in every function call.
Then don't do that -- just common sense. I see no connection to the
way you get at the memory in the first place.
> Nor can I spend 8 bytes of memory for each and every variable.
Actually, std::vector will typically take 12 bytes -- base pointer,
buffer capacity, number of valid elements. Your original question was
about an aspect of building a collection class. How are you going to
make that class smaller than 8 bytes? And how many data sets of what
size each do you envision having?
> The alternative to 'new' is static memory allocation, which
> died quietly along with fortran 77 some 15 years ago.
As far as my engagement here is concerned the alternative is not
dynamic vs. static allocation, but taking advantage of others' work
vs. redoing it worse. But then, elsewhere you wrote this...
> whatever pitfalls I stumble across by doing this provide ample
> learning opportunities.
....which is all but impossible to argue with
Martin
--
No wonder that illegitimate children are commonly
the greatest minds; they are the result of an hour
full of wit, marital ones often spring from boredom.
--Theodor Gottlieb von Hippel
Martin Eisenberg skrev:
> Rune Allnor wrote:
> > Martin Eisenberg skrev:
> >> Rune Allnor wrote:
> >> > Martin Eisenberg skrev:
> >>
> >> >> Vectors are easy to use well. Explicit memory management is
> >> >> notoriously easy to mess up. The more of it you can sweep under
> >> >> the rug, the less effort it takes to write quality code. That's
> >> >> one of the big reasons the standard library has container
> >> >> classes.
> >> >
> >> > Maybe I am paranoid after having used matlab for too long, but
> >> > I would like to have as much control as possible.
> >>
> >> I'd be interested in the specific things you want control over and
> >> how you achieve them with new.
> >
> > Computational overhead in the classes is one aspect, memory
> > usage is another.
>
> Given that we've been talking about access speed before, do you mean
> the overhead of allocation itself?
Yes. And also the speed with which one accesses the objects.
The pointer access to the object is comparable in time to the
few arithmetical operations that needs to ge done on each
object. And I have potentially millions of objects to deal with.
> That won't differ whether the
> allocation is done in user or library code, for the same number and
> sizes of buffers.
As I said in another subthread, using as much information about
the required memory at an as early stage as possible, is the
key here.
> > I can't afford to use call-by-value functions that pass the full
> > data set in every function call.
>
> Then don't do that -- just common sense.
That's why I don't use matlab.
> I see no connection to the
> way you get at the memory in the first place.
I can't really see your point. On the one hand you obviously
are a big fan of the STL classes, on the other I don't see
what you are criticising me for.
> > Nor can I spend 8 bytes of memory for each and every variable.
>
> Actually, std::vector will typically take 12 bytes -- base pointer,
> buffer capacity, number of valid elements. Your original question was
> about an aspect of building a collection class. How are you going to
> make that class smaller than 8 bytes? And how many data sets of what
> size each do you envision having?
The basic collection is a few millions of (x,y,z) data points,
most likely floats, I doun't think I need double here. The main
part is certain data structures needed to process that set of
points. The big issue here is large-volume data processing.
Visiting lots of data points and doing relatively simple operations
at each visit.
> > The alternative to 'new' is static memory allocation, which
> > died quietly along with fortran 77 some 15 years ago.
>
> As far as my engagement here is concerned the alternative is not
> dynamic vs. static allocation, but taking advantage of others' work
> vs. redoing it worse. But then, elsewhere you wrote this...
>
> > whatever pitfalls I stumble across by doing this provide ample
> > learning opportunities.
>
> ...which is all but impossible to argue with
Well, only by making your own attempts are you able to
appreciate what the masters achieve.
Besides, I just don't trust somebody who implements
"vector" and "matrix" as two different classes and then
claim they are suitable for numerical work. The sensible
way to do this, is to implement a matrix class and use
the (1 x N) or (N x 1) instance as a vector. If you don't
believe me try to see the difference between
x'x and xx' where x is a Nx1 vector.
> The basic collection is a few millions of (x,y,z) data points,
> most likely floats, I doun't think I need double here. The main
> part is certain data structures needed to process that set of
> points. The big issue here is large-volume data processing.
> Visiting lots of data points and doing relatively simple operations
> at each visit.
Then it is important for this algorithm to travel over these data
in small steps to minimize the number of page faults. Either
the algorithm has to be adapted to the layout of the data, or
the layout of the data should fit the way the algorithm visits
the data. Whether the data stay in a vector or in a simple
array is not so important, you just get an iterator or a
pointer and increment it.
Andreas Huennebeck skrev:
> Rune Allnor wrote:
>
> > The basic collection is a few millions of (x,y,z) data points,
> > most likely floats, I doun't think I need double here. The main
> > part is certain data structures needed to process that set of
> > points. The big issue here is large-volume data processing.
> > Visiting lots of data points and doing relatively simple operations
> > at each visit.
>
> Then it is important for this algorithm to travel over these data
> in small steps to minimize the number of page faults. Either
> the algorithm has to be adapted to the layout of the data, or
> the layout of the data should fit the way the algorithm visits
> the data. Whether the data stay in a vector or in a simple
> array is not so important, you just get an iterator or a
> pointer and increment it.
>
> bye
> Andreas
It is a little bit more involved than that. It has to do with the
nature of the application. The data points are accessed in
random order, what memory locations are concerned.
I had the basic ideas ten years ago. At the time, I studied
the problem and found a "philosophical" solution. I did not
implement the ideas, for a number of reasons. First, I was
thinking too "hands on", with a NEW or MALLOC call for
each data point. Second, at the time there were just no
relevant computational resources for the amounts of data.
A very fast compute I used at the time -- a DEC Alpha
which was a supercomuter compared to anything
else except the Cray -- ran at 600 MHz while the PCs
ran at 100 - 200 MHz. I don't remember now, but I suspect
large computers at the time would have some 16 - 64 MB
RAM.
What has happened now is that I have found out what
data structures to use, and how to implement the
program. That, along with readily available computers
having the necessary capacity to deal with real-world
data, has made me attempt a second go on this problem.
> I can't really see your point. On the one hand you obviously
> are a big fan of the STL classes, on the other I don't see
> what you are criticising me for.
You see, when someone actively avoids the standard library facilities
giving "control" and such as a reason, my superstition meter goes up.
I know you know the feeling. So I probed your motivation, trying to
point out how std::vector mostly wouldn't hinder your goals and the
advantages it buys you.
One thing using canned stuff won't give you is the learning
experience, as you noted -- and that's a much better reason than
worries about a dozen bytes among millions.
> Besides, I just don't trust somebody who implements
> "vector" and "matrix" as two different classes and then
> claim they are suitable for numerical work.
I submit that you're interpreting the class name in an inappropriate
context. CS types sometimes like to call an ordered, O(1) random-
access, homogeneously-typed collection ADT a "vector" -- I can only
guess, to distinguish from its implementation as a basic construct of
many languages called "array" which is also common for the ADT itself
-- and that in turn probably influences library designers who add a
class implementing the same ADT to a language that already has
arrays.
The term is obviously loaned from mathematics, and careless
terminology creep is arguably really bad (list is not always
distinguished from array/vector in some CS classrooms -- dirty!). The
point is that this kind of "vector" class is *not meant* to embody
its namesake from linear algebra. That's why there is no std::matrix
class and no implication about those designers' technical aptitude.