Are all query iterators const?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Are all query iterators const?

Hal Clark
I'm new to Boost.Geometry and have recently discovered the
rtree::qbegin()/qend() mechanism. It is useful (thanks!) but the
iterators are all marked const. Is it possible to get non-const
iterators without mucking about with const_cast? (I see
rtree::begin()/end() are new between 1.57.0 and 1.58.0 -- hopefully
non-const qbegin()/qend() follow more easily.)

---

My use case is as follows. I'm implementing some point clustering
algorithms. My point primitive has a 'ClusterID' attribute member, and
something like a (fat) user-defined boost::any. Because the point is
not cheap to copy, I'm iterating over nearest neighbours and marking
the ClusterID in-place using something like this:

        //Using Boost version 1.57.0.
        ...
        RTree_t::const_query_iterator it;
        it = rtree.qbegin(boost::geometry::index::satisfies(
            [](const MyPoint &) -> bool { return true; } )
        );
        for( ; it != rtree.qend(); ++it){
            //it->ClusterID = -1; //Compilation error: disregards const.
            const_cast<MyPoint &>(*it).ClusterID = -1;
        }

First: my understanding is that because the spatial coordinates are
not altered the const_cast is valid. Is this correct?

Second: can something like the back_inserter interface be created that
won't copy the matching points? non-const query_iterators would be
nice. I don't understand the rationale for, or distinction between,
query_iterator, spatial_query_iterator, distance_query_iterator, but
presumably they are all non-const candidates.

-hal clark
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Are all query iterators const?

Adam Wulkiewicz
Hi Hal,

Hal Clark wrote:
> I'm new to Boost.Geometry and have recently discovered the
> rtree::qbegin()/qend() mechanism. It is useful (thanks!) but the
> iterators are all marked const. Is it possible to get non-const
> iterators without mucking about with const_cast? (I see
> rtree::begin()/end() are new between 1.57.0 and 1.58.0 -- hopefully
> non-const qbegin()/qend() follow more easily.)

All rtree iterators are const iterators. The rationale behind it is
similar to the rationale behind the constness of std::multiset
iterators. If it was possible to modify the value it would be very easy
to break the index. In the case of the rtree it's even more
understandable since by definition it's an index, not a container. So
it's a data structure, typically containing only some simple geometrical
representation and an ID, which is kept aside of some container storing
the actual data.

That said, we could of course think about relaxing this requirement. The
rtree taking a ValueType is similar to std::multiset. But in the STL
there is also std::multimap storing std::pair<const Key, T>. If we could
ensure that the geometrical part of the value couldn't be changed by the
user we could implement mutable interators.  For instance, mutable
iterators could be enabled only for some specific ValueTypes, e.g.
std::pair<const Box, T>, boost::tuple<const Box, T>, etc. But this could
be not intuitive for the users. An alternative would be to add an rtree
version similar to std::multimap (e.g. rtree_map or soem other name).
Instead of a single ValueType template parameter it could take two:
Indexable and some type T. Then it'd require to insert objects of
ValueType std::pair<const Indexable, T> exactly like the std::multimap.

> ---
>
> My use case is as follows. I'm implementing some point clustering
> algorithms. My point primitive has a 'ClusterID' attribute member, and
> something like a (fat) user-defined boost::any. Because the point is
> not cheap to copy, I'm iterating over nearest neighbours and marking
> the ClusterID in-place using something like this:
>
>          //Using Boost version 1.57.0.
>          ...
>          RTree_t::const_query_iterator it;
>          it = rtree.qbegin(boost::geometry::index::satisfies(
>              [](const MyPoint &) -> bool { return true; } )
>          );
>          for( ; it != rtree.qend(); ++it){
>              //it->ClusterID = -1; //Compilation error: disregards const.
>              const_cast<MyPoint &>(*it).ClusterID = -1;
>          }

You should know that the Values can be heavily copied during the
insertion and removal.

Have you considered keeping the points as small as possible and keeping
the rest (boost::any) in some container outside the rtree? In general
using objects containing cold data for indexing is not a good idea
because of increased number of cache misses, so worse performance.

> First: my understanding is that because the spatial coordinates are
> not altered the const_cast is valid. Is this correct?

Yes.

> Second: can something like the back_inserter interface be created that
> won't copy the matching points?

Do you have the query() member function in mind? But instead of copying
the Values into the output iterator (so working like std::copy())
calling some Function for all Values meeting predicates (so working like
std::for_each())?

I wouldn't recommend it but actually you could do it even now by passing
Function/UnaryPredicate into bgi::satisfies() and then into query()
method. This predicate would be called for all Values meeting other
predicates. To prevent copying you could pass some kind of dummy output
iterator (dev_null_iterator). But this code wouldn't be clear. And the
catch is that also in this case the const reference to Value is passed
into the UnaryPredicate taken by the bgi::satisfies(). The reason is the
same, to not allow the user to modify the Indexable part by mistake.

> non-const query_iterators would be
> nice. I don't understand the rationale for, or distinction between,
> query_iterator, spatial_query_iterator, distance_query_iterator, but
> presumably they are all non-const candidates.

The kind of an iterator is not a problem here, all of them could be made
mutable. The problem is, how to design an interface dissallowing the
users to modify an Indexable part of a ValueType object stored in the
rtree index.
But your solution seems to be ok (const_cast). It's the same as if
someone wanted to do a similar thing with std::set. And const_cast is a
nice warning sign.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Are all query iterators const?

Hal Clark
In reply to this post by Hal Clark
> Date: Wed, 2 Sep 2015 23:36:39 +0200
> From: Adam Wulkiewicz <[hidden email]>
> To: "Boost.Geometry library mailing list" <[hidden email]>
> Subject: Re: [geometry] Are all query iterators const?
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> Hi Hal,
>
> Hal Clark wrote:
>> I'm new to Boost.Geometry and have recently discovered the
>> rtree::qbegin()/qend() mechanism. It is useful (thanks!) but the
>> iterators are all marked const. Is it possible to get non-const
>> iterators without mucking about with const_cast? (I see
>> rtree::begin()/end() are new between 1.57.0 and 1.58.0 -- hopefully
>> non-const qbegin()/qend() follow more easily.)
>
> All rtree iterators are const iterators. The rationale behind it is
> similar to the rationale behind the constness of std::multiset
> iterators. If it was possible to modify the value it would be very easy
> to break the index. In the case of the rtree it's even more
> understandable since by definition it's an index, not a container. So
> it's a data structure, typically containing only some simple geometrical
> representation and an ID, which is kept aside of some container storing
> the actual data.
>
> That said, we could of course think about relaxing this requirement. The
> rtree taking a ValueType is similar to std::multiset. But in the STL
> there is also std::multimap storing std::pair<const Key, T>. If we could
> ensure that the geometrical part of the value couldn't be changed by the
> user we could implement mutable interators.  For instance, mutable
> iterators could be enabled only for some specific ValueTypes, e.g.
> std::pair<const Box, T>, boost::tuple<const Box, T>, etc. But this could
> be not intuitive for the users. An alternative would be to add an rtree
> version similar to std::multimap (e.g. rtree_map or soem other name).
> Instead of a single ValueType template parameter it could take two:
> Indexable and some type T. Then it'd require to insert objects of
> ValueType std::pair<const Indexable, T> exactly like the std::multimap.
>

Thanks for the clear, informative response Adam. I'm primarily
interested in attaching arbitrary data to the point class in order to
also cluster non-spatial attributes. For example, using GDBSCAN
(http://link.springer.com/article/10.1023%2FA%3A1009745219419). The
crux of my issue is that the R*-tree may or may not need to index the
non-spatial attributes in addition to spatial coordinates. I'm still
toying with the idea of making these de-facto spatial ("virtual" or
"phony") coordinates, but some clustering algorithms require a clear
distinction. Also, defining a custom distance metric seems a bit
involved.

Your suggestion to make a mapping from some token to external data is
probably the most reliable means of including attributes, but would
result in lots of cache misses and make it hard to index attributes.
At the moment I'm simply giving the user a template to decide whether
they want to pay for fat attribute members.

You mentioned std::multiset which means duplicates are accepted and
indexed properly. This is great! Most R*-trees I've seen do not do
this. Is this a property that is guaranteed into the future? Other
than a post you made in 2014 for Boost 1.55
(http://stackoverflow.com/questions/22920055/boostgeometryrtreecontains-return-multiple-identical-results),
I do not see any claims in the documentation.

Also, not that my implementation is suitable at the moment, but are
you accepting pull requests for clustering algorithms? I see a
suspicious lack of them...


> I wouldn't recommend it but actually you could do it even now by passing
> Function/UnaryPredicate into bgi::satisfies() and then into query()
> method. This predicate would be called for all Values meeting other
> predicates. To prevent copying you could pass some kind of dummy output
> iterator (dev_null_iterator). But this code wouldn't be clear. And the
> catch is that also in this case the const reference to Value is passed
> into the UnaryPredicate taken by the bgi::satisfies(). The reason is the
> same, to not allow the user to modify the Indexable part by mistake.
>

Neat trick!


Thanks,
-hal clark
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Are all query iterators const?

Adam Wulkiewicz
Hi Hal,

Hal Clark wrote:
<snip>

> Thanks for the clear, informative response Adam. I'm primarily
> interested in attaching arbitrary data to the point class in order to
> also cluster non-spatial attributes. For example, using GDBSCAN
> (http://link.springer.com/article/10.1023%2FA%3A1009745219419). The
> crux of my issue is that the R*-tree may or may not need to index the
> non-spatial attributes in addition to spatial coordinates. I'm still
> toying with the idea of making these de-facto spatial ("virtual" or
> "phony") coordinates, but some clustering algorithms require a clear
> distinction. Also, defining a custom distance metric seems a bit
> involved.
>
> Your suggestion to make a mapping from some token to external data is
> probably the most reliable means of including attributes, but would
> result in lots of cache misses and make it hard to index attributes.
> At the moment I'm simply giving the user a template to decide whether
> they want to pay for fat attribute members.

If non-spatial data is used in the searching process then of course it
could be better to store it inside the Point. Otherwise it'd probably be
better to get rid of the cold data. It's hard to tell which would be
better without profiling. And I'm guessing the performance may depend on
the actual types of points so leaving the decision to the user may be
the best solution.

> You mentioned std::multiset which means duplicates are accepted and
> indexed properly. This is great! Most R*-trees I've seen do not do
> this. Is this a property that is guaranteed into the future? Other
> than a post you made in 2014 for Boost 1.55
> (http://stackoverflow.com/questions/22920055/boostgeometryrtreecontains-return-multiple-identical-results),
> I do not see any claims in the documentation.

Yes, this is how the R-tree works right now and breaking changes
wouldn't be acceptible in this case. If a std::set-like-rtree was needed
we should probably implement a separate index.

>
> Also, not that my implementation is suitable at the moment, but are
> you accepting pull requests for clustering algorithms? I see a
> suspicious lack of them...

Sure, any help is appreciated. Indeed there are many algorithms which
could be good additions to the library. So don't hesitate to propose
anything. It was more or less like this with the rtree.

A'propos clustering algorithms, some time ago I planned to add a
balancing algorithm based on k-means but I haven't found enough time:
- Revisiting R-tree Construction Principles
(http://pyrtree.googlecode.com/hg-history/f81b9c8157117a1f97cdcc3341ae9a061f5df4fa/doc/ref/r-tree-clustering-split-algo.pdf)

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry