[ApproxPts] offers three functions for preprocessing sets of approximate points whose coordinates are given as [double]s. Given a large set of approximate points with considerable overlap of the error boxes of adjacent points, the preprocessing algorithms determine a smaller set of approximate points which preserve the geometrical disposition of the original points but with little or no overlap of the error boxes. The preprocessed points are NOT necessarily a subset of the original points. Here is a quick summary of the functions.
typedef ApproxPts::ApproxPt ApproxPt; vector<ApproxPt> OriginalPoints; // to be filled with the original approx pts vector<double> epsilon; // epsilon[i] is semiwidth of error box in dimension i vector<ApproxPt> NewPoints; // the preprocessed points will be put here ApproxPts::PreprocessGridAlgm(NewPoints, OriginalPoints, epsilon); ApproxPts::PreprocessAggrAlgm(NewPoints, OriginalPoints, epsilon); ApproxPts::PreprocessSubdivAlgm(NewPoints, OriginalPoints, epsilon);
PreprocessGridAlgm
PreprocessAggrAlgm
PreprocessSubdivAlgm
PreprocessAggrAlgm
in certain cases (e.g. when the input points
are already fairly well separated) but can be faster (e.g. when only
few preprocessed points are produced, which will happen if the
original points are densely packed compared to their error
neighbourhoods).
All three algorithms work by partitioning the original point into sets, and then
choosing the average of each set as the representative of those original points.
Full details of these algorithms can be found in Marialaura Torrente's thesis.
PreprocessGridAlgm
the sets in the partition comprise all original points
which are closer to a fixed grid point than to any other of the grid
points. In other words, viewing the grid as a lattice, the whole space can
be covered by grid-translates of the fundamental region; the partitions are
simply the original points lying in one of these grid-translates.
PreprocessAggrAlgm
the sets in the partition are determined by an
iterative process. Initially each set contains a single original point,
then closest pair of sets are iteratively merged (provided that no original
point would end up too far from the average of the merged sets).
PreprocessSubdivAlgm
the sets in the partition are determined by an
iterative process. Initially there is a single set containing all the
original points, then if some original point is too far from the average of
the set to which it belng, that point is moved to its own new set, then an
iterative redistribution occurs (reassigning original points to optimize
the "goodness of representation").
All the preprocessing algorithms rescale their inputs so that the error widths in each dimension are all equal to 1. The main work is done with these rescaled pointed, and at the very end the results are scaled back.
PreprocessGridAlgm
might be better if we were to use std::map
s, but it
seems fast enough as is. We consider each input point: if the nearest grid
point is already in out list of grid points we append the input to its list
of associated original points, otherwise we append a new grid point with
just the single input point in its list of associated original points.
Finally we compute the averages of each list of original points associated
to a fixed grid point. These averages are our result.
PreprocessAggrAlgm
implements an "aggregative" algorithm: initially the
original points are split into subsets each containing exactly one original
point, then iteratively nearby subsets are coalesced into larger subsets
provided each original point of the two subsets is not too far from the
"centre of gravity" of the coalesced set -- this proviso is necessary as
otherwise there are pathological examples (see Laura Torrente's thesis).
PreprocessSubdivAlgm
implements a "subdivision" algorithm. Initially all
original points are placed into a single partition. Then iteratively we seek
the original point furthest from the average of its set. If this distance is
below the threshold then we stop (all original points are sufficiently well
represented by the averages of their sets). Otherwise we separate the worst
represented original point into a new set initially containing just itself.
Now we redistribute the original points: we do this by minimizing the sum of
the squares of the L2 distances of the original points from their respective
representatives.
I do not like the typedef for ApproxPts::ApproxPt
because the
name seems very redundant; I am also uneasy about having a typedef
in a header file -- perhaps it should be a genuine class?
The preprocessing algorithms should really receive input as a pair of iterators, and the output should be sent to an output iterator. But such an interface would rather uglify the code -- what to do???
Laura Torrente should check this file, and complete the descriptions of
the implementation of PreprocessAggrAlgm
and PreprocessDisjAlgm
.