Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Tim Finer
Hello,

I really want to use a persistent rtree backed by shared memory (or a
memory mapped file).  I don't know in advance how large the data. I
experimented with resizing a boost::vector backed by a shared memory
allocator and applied what I found there to the example at
http://www.boost.org/doc/libs/1_55_0/libs/geometry/doc/html/geometry/spatial_indexes/rtree_examples/index_stored_in_shared_memory_using_boost_interprocess.html.

What I did was wrap an insertion function that catches a bad_alloc,
closes the shared memory segment, grows the shared memory, then resets
the pointer to the rtree.  I've verified that this technique works fine
for boost::vector, but not for the rtree, because it appears that there
are some other structures internally that are possibly noting the (old)
capacity of the shared memory?

I also noted in a previous thread a similar problem, and applied the fix
on github, but that didn't help
(http://boost-geometry.203548.n3.nabble.com/rtree-crash-when-used-with-inter-process-td4026037.html).

    namespace bg = boost::geometry;
    namespace bgm = boost::geometry::model;
    namespace bgi = boost::geometry::index;
    namespace bi = boost::interprocess;

    typedef bgm::point<double, 3, bg::cs::cartesian> bgPoint3;
    typedef std::pair<bgPoint3, std::uint64_t> bgValue;
    typedef bgm::box<bgPoint3> bgBox3;

    typedef bgi::linear<16> Partitioner;
    typedef bgi::indexable<bgValue> Indexer;
    typedef bgi::equal_to<bgValue> Comparator;
    typedef bi::allocator<bgValue,
bi::managed_shared_memory::segment_manager>  Allocator;
    typedef bgi::rtree<bgValue, Partitioner, Indexer, Comparator,
Allocator > bgMemRTree;
    typedef std::unique_ptr<bi::managed_shared_memory> MemPtr;


    inline void ResizingInsert(MemPtr& mem, bgMemRTree*& rtree, const
bgPoint3& pt, std::uint64_t i)
    {
       try
       {
          rtree->insert(std::make_pair(pt, i));
       }
       catch(bi::bad_alloc&)
       {
          auto newSize = static_cast<size_t>(mem->get_size() * 0.5);

          // Close
          mem.reset();
          bi::managed_shared_memory::grow("RTree-Test", newSize);
          mem.reset(new bi::managed_shared_memory(bi::open_only,
"RTree-Test"));
          rtree = mem->find<bgMemRTree>("rtree").first;

          // Try again, let bad_alloc out if this fails a second time.
          rtree->insert(std::make_pair(pt, i));
       }
    }

    MemPtr mem(new bi::managed_shared_memory(bi::create_only,
"RTree-Test", 1024 * 64));

    for ( std::uint64_t i = 0 ; i < 50000; ++i )
       ResizingInsert( mem, rtree, bgPoint3(rndDbl(prng), rndDbl(prng),
rndDbl(prng)), i);


    Dies down in
Boost_1.55.0\Boost\boost\geometry\index\detail\varray.hpp:74
    BOOST_GEOMETRY_INDEX_ASSERT(s <= v.capacity(), "size too big");


I'm not married to the way I'm doing this, and welcome any suggestions
or workarounds.  My other thought is copying the entire tree into a new
(larger) shared memory segment, but I'm sure that'll be a huge
performance hit.

Thanks in Advance,
Tim
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Adam Wulkiewicz
Hi Tim,

Tim Finer wrote:

> Hello,
>
> I really want to use a persistent rtree backed by shared memory (or a
> memory mapped file).  I don't know in advance how large the data. I
> experimented with resizing a boost::vector backed by a shared memory
> allocator and applied what I found there to the example at
> http://www.boost.org/doc/libs/1_55_0/libs/geometry/doc/html/geometry/spatial_indexes/rtree_examples/index_stored_in_shared_memory_using_boost_interprocess.html.
>
> What I did was wrap an insertion function that catches a bad_alloc,
> closes the shared memory segment, grows the shared memory, then resets
> the pointer to the rtree.  I've verified that this technique works
> fine for boost::vector, but not for the rtree, because it appears that
> there are some other structures internally that are possibly noting
> the (old) capacity of the shared memory?

If an exception is thrown somewhere during the insertion the rtree is
left in the inconsistent state, see the warning:

http://www.boost.org/doc/libs/1_55_0/libs/geometry/doc/html/geometry/reference/spatial_indexes/boost__geometry__index__rtree/insert_value_type_const___.html#classboost_1_1geometry_1_1index_1_1rtree_1ad47980467e66b8644df18a480dbf9d86

In this case the allocation of the memory for additional node during the
balancing procedure failed and the extra element was left in the node.
The rtree only guarantees that there will be no memory leaks after
exception is thrown.
But yes, an rtree should probably be left in more consistent state
(basic guarantee) and there should be no assert/segfault when someone
wanted to modify the rtree after the exception was thrown.

The copying won't work because some parts of the rtree may dissapear.
The rtree doesn't guarantee strong exception safety.

[Extended description]

The support for strong guarantee in the insertion/removal/reinsertion
would require to prepare all changes beforehand and then to apply them
all at once. E.g. allocate all required nodes, fill them with data,
somehow store this changeset and then when everything was ok, only
replace the pointers. In the meanwhile all changes should probably be
tracked and it should be possible to work with the the "temporary"
structure transparently. I'm guessing that it would introduce some
performance penalty and that the no-leak or basic exception safety is a
"natural" exception safety level for an rtree.

Why I think that this is how strong exception guarantee should be
implemented?
It's because the balancing procedure may be quite complicated e.g. if
the value is removed from the R*-tree and there is an underflow in a
node detected, this node is removed and remaining elements are inserted
into the rtree again. If during such insert an overflow is detected the
forced reinsertion procedure may kick in and some number of elements
will be removed from overflowed node and reinserted again. If during
such forced reinsertion again an overflow occurs the new node is created
and splitting procedure is applied. And this is done for each inserted
element or a child node at every level of the rtree if necessary. So if
at some Point and exception is thrown in the allocator or Value's copy
ctor or whatever, we're somewhere in the middle of the procedure
described above and without tracking what happened before we can't do
anything.

>
> I also noted in a previous thread a similar problem, and applied the
> fix on github, but that didn't help
> (http://boost-geometry.203548.n3.nabble.com/rtree-crash-when-used-with-inter-process-td4026037.html).
Yes, because this was related to the way how nodes are stored in the
memory and in your case the exceptions are the problem.

<snip>
Btw, in the example, shouldn't the new size be = mem->get_size() * 1.5 ?

> I'm not married to the way I'm doing this, and welcome any suggestions
> or workarounds.  My other thought is copying the entire tree into a
> new (larger) shared memory segment, but I'm sure that'll be a huge
> performance hit.

In order to work the way you want the rtree requires an allocator which
will be able to increase the size of shared memory if needed.
I'm not sure if this is possible in general and/or with Interprocess.
I'll try to play with it in my free time.
If you were able to come up with something before me, please let me know.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Tim Finer
Hi Adam,

Thank you for all the details and for all your work and also for a same day response!

> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz <[hidden email]> wrote:
>
> Hi Tim,
>
> Tim Finer wrote:
>> Hello,
>>
>> I really want to use a persistent rtree backed by shared memory (or a memory mapped file).  I don't know in advance how large the data. I experimented with resizing a boost::vector backed by a shared memory allocator and applied what I found there to the example at http://www.boost.org/doc/libs/1_55_0/libs/geometry/doc/html/geometry/spatial_indexes/rtree_examples/index_stored_in_shared_memory_using_boost_interprocess.html.
>>
>> What I did was wrap an insertion function that catches a bad_alloc, closes the shared memory segment, grows the shared memory, then resets the pointer to the rtree.  I've verified that this technique works fine for boost::vector, but not for the rtree, because it appears that there are some other structures internally that are possibly noting the (old) capacity of the shared memory?
>
> If an exception is thrown somewhere during the insertion the rtree is left in the inconsistent state, see the warning:
>
> http://www.boost.org/doc/libs/1_55_0/libs/geometry/doc/html/geometry/reference/spatial_indexes/boost__geometry__index__rtree/insert_value_type_const___.html#classboost_1_1geometry_1_1index_1_1rtree_1ad47980467e66b8644df18a480dbf9d86

Wow, I should've read that first!

>
> In this case the allocation of the memory for additional node during the balancing procedure failed and the extra element was left in the node.
> The rtree only guarantees that there will be no memory leaks after exception is thrown.
> But yes, an rtree should probably be left in more consistent state (basic guarantee) and there should be no assert/segfault when someone wanted to modify the rtree after the exception was thrown.
>
> The copying won't work because some parts of the rtree may dissapear. The rtree doesn't guarantee strong exception safety.
>
> [Extended description]
>
> The support for strong guarantee in the insertion/removal/reinsertion would require to prepare all changes beforehand and then to apply them all at once. E.g. allocate all required nodes, fill them with data, somehow store this changeset and then when everything was ok, only replace the pointers. In the meanwhile all changes should probably be tracked and it should be possible to work with the the "temporary" structure transparently. I'm guessing that it would introduce some performance penalty and that the no-leak or basic exception safety is a "natural" exception safety level for an rtree.

Sounds like this might be another policy to bind, since it is expensive.  Having a strong guarantee would be especially useful for a disk based data structure.

>
> Why I think that this is how strong exception guarantee should be implemented?
> It's because the balancing procedure may be quite complicated e.g. if the value is removed from the R*-tree and there is an underflow in a node detected, this node is removed and remaining elements are inserted into the rtree again. If during such insert an overflow is detected the forced reinsertion procedure may kick in and some number of elements will be removed from overflowed node and reinserted again. If during such forced reinsertion again an overflow occurs the new node is created and splitting procedure is applied. And this is done for each inserted element or a child node at every level of the rtree if necessary. So if at some Point and exception is thrown in the allocator or Value's copy ctor or whatever, we're somewhere in the middle of the procedure described above and without tracking what happened before we can't do anything.

Wow, this is a tough problem.   I've typed a few thoughts out, only to delete them moments later.  You almost need a way to store the entire state of the tree first (I know that's not feasible).

>
>>
>> I also noted in a previous thread a similar problem, and applied the fix on github, but that didn't help (http://boost-geometry.203548.n3.nabble.com/rtree-crash-when-used-with-inter-process-td4026037.html).
> Yes, because this was related to the way how nodes are stored in the memory and in your case the exceptions are the problem.
>
> <snip>
> Btw, in the example, shouldn't the new size be = mem->get_size() * 1.5 ?

You'd think it would be but the argument to grow is in "additional bytes".

>
>> I'm not married to the way I'm doing this, and welcome any suggestions or workarounds.  My other thought is copying the entire tree into a new (larger) shared memory segment, but I'm sure that'll be a huge performance hit.
>
> In order to work the way you want the rtree requires an allocator which will be able to increase the size of shared memory if needed.
> I'm not sure if this is possible in general and/or with Interprocess.
> I'll try to play with it in my free time.
> If you were able to come up with something before me, please let me know.

Some other options I'm looking at are:

1 Scanning all the data first (it isn't in a convenient format, so I'll need to store it in another mapped file), then allocating enough space for the rtree.

2. Read up on allocators and trap the exception down at a lower level, and possibly do the resize there.  There are no guarantees about the newly allocated pointer's location after growing.        

3. Monitor the capacity of the shared file and proactively resize given a threshold (say 90%).  I just thought of this one and I like it because it sidesteps the entire exception problem altogether.  The more I think about it, the more I like this one.

I'll let you know what I find.

Best Regards,
Tim

>
> Regards,
> Adam
> _______________________________________________
> Geometry mailing list
> [hidden email]
> http://lists.boost.org/mailman/listinfo.cgi/geometry
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Adam Wulkiewicz
Hi Tim,

Tim Finer wrote:
>> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz <[hidden email]> wrote:
>>
>> In order to work the way you want the rtree requires an allocator which will be able to increase the size of shared memory if needed.
>> I'm not sure if this is possible in general and/or with Interprocess.
>> I'll try to play with it in my free time.
>> If you were able to come up with something before me, please let me know.
> Some other options I'm looking at are:
>
> 1 Scanning all the data first (it isn't in a convenient format, so I'll need to store it in another mapped file), then allocating enough space for the rtree.

FYI, the packing algorithm done in the constructor (when you're creating
the rtree from e.g. a pair of iterators) should construct an rtree
having nodes "packed" as tightly as possible, you could actually predict
what'll be the actual number of nodes created. At least for currently
used algorithm. If you're curious see line 109 of
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp.
So maybe it'd be a good idea to first load the objects to a vector,
calculate their size, then create a shared memory and construct an rtree
using pack-create.
Though I fear that if the memory could be fragmented this might not work
in all cases.

> 2. Read up on allocators and trap the exception down at a lower level, and possibly do the resize there.  There are no guarantees about the newly allocated pointer's location after growing.
AFAIU this is what I was thinking about, an allocator catching the
exception in allocate(), growing the memory block and recreating itself.
I thought about storing shared_ptr to a struct holding unique_ptrs to
shared memory and original allocator. The purpose is to have only one
allocator and modify a pointer to it from any of the rebound allocators.
Then in allocate() of any of rebound allocators this original allocator
would be always rebound to the current one.

As you said this won't work, the rtree would have to be reconstructed
after grow and in that time the algorithm would be somewhere in the
middle of traversal, holding old/invalid pointers, etc.
>        
> 3. Monitor the capacity of the shared file and proactively resize given a threshold (say 90%).  I just thought of this one and I like it because it sidesteps the entire exception problem altogether.  The more I think about it, the more I like this one.

Yes, this is also worth trying.
It seems that there is a get_free_memory() method in
managed_shared_memory so the amount of free space could be easily returned.
In case if the memory could be fragmented you could try to allocate some
bigger block and see if it succeeds. Then resize the memory if necessary.
When the exception is thrown get_free_memory() returns 968 in my test
and I'm starting from 1024 * 64.
The author of Boost.Interprocess could probably give us some pointers
about how big the threshold should be and if the fragmentation should be
taken into account.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Tim Finer
Hi Adam,

It's been a few months, but I finally got back to trying out the bgl
rtree with a resizable memory mapped file.  Things are looking up and I
have another question.

I've edited our correspondence down to the essentials:

On 7/26/14, 4:08 AM, Adam Wulkiewicz wrote:

> Hi Tim,
>
> Tim Finer wrote:
>>> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz
>>> <[hidden email]> wrote:
>>>
>>> In order to work the way you want the rtree requires an allocator
>>> which will be able to increase the size of shared memory if needed.
>>> I'm not sure if this is possible in general and/or with Interprocess.
>>> I'll try to play with it in my free time.
>>> If you were able to come up with something before me, please let me
>>> know.
>> Some other options I'm looking at are:
>>
>> 1 Scanning all the data first (it isn't in a convenient format, so
>> I'll need to store it in another mapped file), then allocating enough
>> space for the rtree.
>
> FYI, the packing algorithm done in the constructor (when you're
> creating the rtree from e.g. a pair of iterators) should construct an
> rtree having nodes "packed" as tightly as possible, you could actually
> predict what'll be the actual number of nodes created. At least for
> currently used algorithm. If you're curious see line 109 of
> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp.
> So maybe it'd be a good idea to first load the objects to a vector,
> calculate their size, then create a shared memory and construct an
> rtree using pack-create.
> Though I fear that if the memory could be fragmented this might not
> work in all cases.
>
I want to avoid a first pass and would prefer to trade space for time.  
Is there a way that I could "actually predict what'll be the actual
number of nodes created" given a live rtree?  I'd like to do this based
upon the currently populated rtree so that I could find the worst case
scenario and allocate at least that much memory in a node insertion wrapper.

>
>>        3. Monitor the capacity of the shared file and proactively
>> resize given a threshold (say 90%). I just thought of this one and I
>> like it because it sidesteps the entire exception problem
>> altogether.  The more I think about it, the more I like this one.
>
> Yes, this is also worth trying.
> It seems that there is a get_free_memory() method in
> managed_shared_memory so the amount of free space could be easily
> returned.
> In case if the memory could be fragmented you could try to allocate
> some bigger block and see if it succeeds. Then resize the memory if
> necessary.
> When the exception is thrown get_free_memory() returns 968 in my test
> and I'm starting from 1024 * 64.
> The author of Boost.Interprocess could probably give us some pointers
> about how big the threshold should be and if the fragmentation should
> be taken into account.
>

Hey!  This worked.  I arbitrarily chose 90% and resized at that point
and my initial tests work!  What I'd like to do know is figure out how
much memory would require.  I'm not terribly worried about fragmentation
(yet), but will get to that after I run some tests with finding
appropriate thresholds.

Best,
Tim
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Adam Wulkiewicz
Hi Tim,

Tim Finer wrote:

> Hi Adam,
>
> It's been a few months, but I finally got back to trying out the bgl
> rtree with a resizable memory mapped file.  Things are looking up and
> I have another question.
>
> I've edited our correspondence down to the essentials:
>
> On 7/26/14, 4:08 AM, Adam Wulkiewicz wrote:
>> Hi Tim,
>>
>> Tim Finer wrote:
>>>> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz
>>>> <[hidden email]> wrote:
>>>>
>>>> In order to work the way you want the rtree requires an allocator
>>>> which will be able to increase the size of shared memory if needed.
>>>> I'm not sure if this is possible in general and/or with Interprocess.
>>>> I'll try to play with it in my free time.
>>>> If you were able to come up with something before me, please let me
>>>> know.
>>> Some other options I'm looking at are:
>>>
>>> 1 Scanning all the data first (it isn't in a convenient format, so
>>> I'll need to store it in another mapped file), then allocating
>>> enough space for the rtree.
>>
>> FYI, the packing algorithm done in the constructor (when you're
>> creating the rtree from e.g. a pair of iterators) should construct an
>> rtree having nodes "packed" as tightly as possible, you could
>> actually predict what'll be the actual number of nodes created. At
>> least for currently used algorithm. If you're curious see line 109 of
>> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp.
>> So maybe it'd be a good idea to first load the objects to a vector,
>> calculate their size, then create a shared memory and construct an
>> rtree using pack-create.
>> Though I fear that if the memory could be fragmented this might not
>> work in all cases.
>>
> I want to avoid a first pass and would prefer to trade space for
> time.  Is there a way that I could "actually predict what'll be the
> actual number of nodes created" given a live rtree?  I'd like to do
> this based upon the currently populated rtree so that I could find the
> worst case scenario and allocate at least that much memory in a node
> insertion wrapper.
>

I think it could be possible. Since 1.56 or 1.57 (I don't remember) no
temporary containers should be created using passed allocator, only
nodes. In the worst case the rtree would have to create 1 leaf node + N
= depth new internal nodes per insert. Unfortunately there is no
official way of checking the depth. You should be able to do it like this:

bgi::detail::rtree::utilities::view<rtree_type>(rtree_obj).depth();

but it's non-official and could be removed/changed in the future.

>>
>>>        3. Monitor the capacity of the shared file and proactively
>>> resize given a threshold (say 90%). I just thought of this one and I
>>> like it because it sidesteps the entire exception problem
>>> altogether.  The more I think about it, the more I like this one.
>>
>> Yes, this is also worth trying.
>> It seems that there is a get_free_memory() method in
>> managed_shared_memory so the amount of free space could be easily
>> returned.
>> In case if the memory could be fragmented you could try to allocate
>> some bigger block and see if it succeeds. Then resize the memory if
>> necessary.
>> When the exception is thrown get_free_memory() returns 968 in my test
>> and I'm starting from 1024 * 64.
>> The author of Boost.Interprocess could probably give us some pointers
>> about how big the threshold should be and if the fragmentation should
>> be taken into account.
>>
>
> Hey!  This worked.  I arbitrarily chose 90% and resized at that point
> and my initial tests work!  What I'd like to do know is figure out how
> much memory would require.  I'm not terribly worried about
> fragmentation (yet), but will get to that after I run some tests with
> finding appropriate thresholds.

So you probably would also need a size of nodes for the esstimate.
Unfortunately it depends on the node type and there is no official way
of obtaining it.

So again, everything described below is non-official and may be changed
in the future.

Currently (1.57) Boost.Variant-based nodes are used by default (this may
change in the future, in fact I plan to do it at soem point). Static
nodes stores varrays/static_vectors, dynamic nodes uses
boost::container::vector<>. So in the case of the container::vector<>
the ammount of memory would also depend on the allocation strategy of
this container. AFAIR this implementation of vector<> allocates only as
much as it needs but I'm not sure.

For Variant-based static nodes you should be able to check the size like
this:
sizeof(
bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_allocator_type::value_type
)

For dynamic nodes it'd be more complicated since you should add the
container::vector<> storage which'd be in the worst case equal to
N*sizeof(Value) for leafs and around N*sizeof(std::pair<Box,
NodePointer>) for internal nodes. AFAIR in the worst case N should be
equal to MAX+1-MIN in the newly created nodes. Box would be of type
model::box<model::point<>> using the same coordinate type and dimension
as the Indexable. NodePointer would be a type returned by allocator
which you could get like this:
bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_pointer


Because of all of this non-official stuff I'd consider it rather a
workaround than a solution.

Have you thought about the implementation of an allocator managing
arbitrary number of Boost.Interprocess allocators/shared files? It could
own some number of BI allocators and try to create a block of memory
using them and if it wasn't able to do it it could create another file.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Tim Finer
Hi Adam,


On 11/21/14, 4:59 PM, Adam Wulkiewicz wrote:

> Hi Tim,
>
> Tim Finer wrote:
>> Hi Adam,
>>
>> It's been a few months, but I finally got back to trying out the bgl
>> rtree with a resizable memory mapped file.  Things are looking up and
>> I have another question.
>>
>> I've edited our correspondence down to the essentials:
>>
>> On 7/26/14, 4:08 AM, Adam Wulkiewicz wrote:
>>> Hi Tim,
>>>
>>> Tim Finer wrote:
>>>>> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz
>>>>> <[hidden email]> wrote:
>>>>>
>>>>> In order to work the way you want the rtree requires an allocator
>>>>> which will be able to increase the size of shared memory if needed.
>>>>> I'm not sure if this is possible in general and/or with Interprocess.
>>>>> I'll try to play with it in my free time.
>>>>> If you were able to come up with something before me, please let
>>>>> me know.
>>>> Some other options I'm looking at are:
>>>>
>>>> 1 Scanning all the data first (it isn't in a convenient format, so
>>>> I'll need to store it in another mapped file), then allocating
>>>> enough space for the rtree.
>>>
>>> FYI, the packing algorithm done in the constructor (when you're
>>> creating the rtree from e.g. a pair of iterators) should construct
>>> an rtree having nodes "packed" as tightly as possible, you could
>>> actually predict what'll be the actual number of nodes created. At
>>> least for currently used algorithm. If you're curious see line 109
>>> of
>>> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp.
>>> So maybe it'd be a good idea to first load the objects to a vector,
>>> calculate their size, then create a shared memory and construct an
>>> rtree using pack-create.
>>> Though I fear that if the memory could be fragmented this might not
>>> work in all cases.
>>>
>> I want to avoid a first pass and would prefer to trade space for
>> time.  Is there a way that I could "actually predict what'll be the
>> actual number of nodes created" given a live rtree?  I'd like to do
>> this based upon the currently populated rtree so that I could find
>> the worst case scenario and allocate at least that much memory in a
>> node insertion wrapper.
>>
>
> I think it could be possible. Since 1.56 or 1.57 (I don't remember) no
> temporary containers should be created using passed allocator, only
> nodes. In the worst case the rtree would have to create 1 leaf node +
> N = depth new internal nodes per insert. Unfortunately there is no
> official way of checking the depth. You should be able to do it like
> this:
>
> bgi::detail::rtree::utilities::view<rtree_type>(rtree_obj).depth();
>
> but it's non-official and could be removed/changed in the future.
Thank you for the hint - having a non-official function is better than
nothing.  I'm using 1.55, and depth() is in there.

Loading time is a lot faster when I create a memory mapped vector,
insert all the points, then create an rtree with the vector's iterators
(using the packing you mentioned before).  Not only does this generate a
smaller rtree, it also is faster at returning queries!

Your mention about the implementation change brings up a question: how
stable is the internal arrangement of nodes from boost version to version?

If I create persistent memory mapped rtrees with version 1.55, will
allocators from 1.56 or 1.57 still map them correctly into memory? I'm
in the middle of implementing all this so my customers can use as a
means of loading very large spatial datasets.  If the memory layout
changes (in a breaking way) from boost version to boost version, that'll
mean a lot more work that I can live with but only if it is easy to
detect and mitigate robustly.

>
>>>
>>>>        3. Monitor the capacity of the shared file and proactively
>>>> resize given a threshold (say 90%). I just thought of this one and
>>>> I like it because it sidesteps the entire exception problem
>>>> altogether.  The more I think about it, the more I like this one.
>>>
>>> Yes, this is also worth trying.
>>> It seems that there is a get_free_memory() method in
>>> managed_shared_memory so the amount of free space could be easily
>>> returned.
>>> In case if the memory could be fragmented you could try to allocate
>>> some bigger block and see if it succeeds. Then resize the memory if
>>> necessary.
>>> When the exception is thrown get_free_memory() returns 968 in my
>>> test and I'm starting from 1024 * 64.
>>> The author of Boost.Interprocess could probably give us some
>>> pointers about how big the threshold should be and if the
>>> fragmentation should be taken into account.
>>>
>>
>> Hey!  This worked.  I arbitrarily chose 90% and resized at that point
>> and my initial tests work!  What I'd like to do know is figure out
>> how much memory would require.  I'm not terribly worried about
>> fragmentation (yet), but will get to that after I run some tests with
>> finding appropriate thresholds.
>
> So you probably would also need a size of nodes for the esstimate.
> Unfortunately it depends on the node type and there is no official way
> of obtaining it.
>
> So again, everything described below is non-official and may be
> changed in the future.
>
> Currently (1.57) Boost.Variant-based nodes are used by default (this
> may change in the future, in fact I plan to do it at soem point).
> Static nodes stores varrays/static_vectors, dynamic nodes uses
> boost::container::vector<>. So in the case of the container::vector<>
> the ammount of memory would also depend on the allocation strategy of
> this container. AFAIR this implementation of vector<> allocates only
> as much as it needs but I'm not sure.
>
> For Variant-based static nodes you should be able to check the size
> like this:
> sizeof(
> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_allocator_type::value_type
> )
>
> For dynamic nodes it'd be more complicated since you should add the
> container::vector<> storage which'd be in the worst case equal to
> N*sizeof(Value) for leafs and around N*sizeof(std::pair<Box,
> NodePointer>) for internal nodes. AFAIR in the worst case N should be
> equal to MAX+1-MIN in the newly created nodes. Box would be of type
> model::box<model::point<>> using the same coordinate type and
> dimension as the Indexable. NodePointer would be a type returned by
> allocator which you could get like this:
> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_pointer
>
>
>
> Because of all of this non-official stuff I'd consider it rather a
> workaround than a solution.
>
> Have you thought about the implementation of an allocator managing
> arbitrary number of Boost.Interprocess allocators/shared files? It
> could own some number of BI allocators and try to create a block of
> memory using them and if it wasn't able to do it it could create
> another file.

I'm hoping that my using of a fixed node type means that I can predict
the size of the nodes using the space they actually take (by keeping
track of the allocator's statistics).  So far, the overhead is roughly 8
bytes per node over the size of the data stored when I use the method
above (create vector, use packing algorithm to create an rtree).

Best,
Tim

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Adam Wulkiewicz
Hi Tim,

Tim Finer wrote:

> On 11/21/14, 4:59 PM, Adam Wulkiewicz wrote:
>> Tim Finer wrote:
>>> It's been a few months, but I finally got back to trying out the bgl
>>> rtree with a resizable memory mapped file.  Things are looking up
>>> and I have another question.
>>>
>>> I've edited our correspondence down to the essentials:
>>>
>>> On 7/26/14, 4:08 AM, Adam Wulkiewicz wrote:
>>>> Tim Finer wrote:
>>>>>> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz
>>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>> In order to work the way you want the rtree requires an allocator
>>>>>> which will be able to increase the size of shared memory if needed.
>>>>>> I'm not sure if this is possible in general and/or with
>>>>>> Interprocess.
>>>>>> I'll try to play with it in my free time.
>>>>>> If you were able to come up with something before me, please let
>>>>>> me know.
>>>>> Some other options I'm looking at are:
>>>>>
>>>>> 1 Scanning all the data first (it isn't in a convenient format, so
>>>>> I'll need to store it in another mapped file), then allocating
>>>>> enough space for the rtree.
>>>>
>>>> FYI, the packing algorithm done in the constructor (when you're
>>>> creating the rtree from e.g. a pair of iterators) should construct
>>>> an rtree having nodes "packed" as tightly as possible, you could
>>>> actually predict what'll be the actual number of nodes created. At
>>>> least for currently used algorithm. If you're curious see line 109
>>>> of
>>>> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp.
>>>> So maybe it'd be a good idea to first load the objects to a vector,
>>>> calculate their size, then create a shared memory and construct an
>>>> rtree using pack-create.
>>>> Though I fear that if the memory could be fragmented this might not
>>>> work in all cases.
>>>>
>>> I want to avoid a first pass and would prefer to trade space for
>>> time.  Is there a way that I could "actually predict what'll be the
>>> actual number of nodes created" given a live rtree?  I'd like to do
>>> this based upon the currently populated rtree so that I could find
>>> the worst case scenario and allocate at least that much memory in a
>>> node insertion wrapper.
>>>
>>
>> I think it could be possible. Since 1.56 or 1.57 (I don't remember)
>> no temporary containers should be created using passed allocator,
>> only nodes. In the worst case the rtree would have to create 1 leaf
>> node + N = depth new internal nodes per insert. Unfortunately there
>> is no official way of checking the depth. You should be able to do it
>> like this:
>>
>> bgi::detail::rtree::utilities::view<rtree_type>(rtree_obj).depth();
>>
>> but it's non-official and could be removed/changed in the future.
> Thank you for the hint - having a non-official function is better than
> nothing.  I'm using 1.55, and depth() is in there.
>
> Loading time is a lot faster when I create a memory mapped vector,
> insert all the points, then create an rtree with the vector's
> iterators (using the packing you mentioned before).  Not only does
> this generate a smaller rtree, it also is faster at returning queries!
>

The above is for one call of a single-value insert().

In the case of packing algorithm you can predict a number of all nodes
(as I probably wrote earlier), see:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp
line 109.

> Your mention about the implementation change brings up a question: how
> stable is the internal arrangement of nodes from boost version to
> version?
>
> If I create persistent memory mapped rtrees with version 1.55, will
> allocators from 1.56 or 1.57 still map them correctly into memory? I'm
> in the middle of implementing all this so my customers can use as a
> means of loading very large spatial datasets.  If the memory layout
> changes (in a breaking way) from boost version to boost version,
> that'll mean a lot more work that I can live with but only if it is
> easy to detect and mitigate robustly.
>

The safest approach would be to never use previously created shared
files with the new version of the library. Some bugs might be fixed,
features added, not only in Geometry but also in Interprocess or any
other library on which it depend, etc. And we're talking about raw data
here stored in a native format. This isn't serialization.

In 1.55 polymorphic nodes was used, and they didn't work properly with
shared memory. You shouldn't use this version.
Since 1.56 variant-based nodes are used.

In the future I plan to replace them with lighter type of nodes, so less
memory should be needed, but the representation would change.

>>
>>>>
>>>>>        3. Monitor the capacity of the shared file and proactively
>>>>> resize given a threshold (say 90%). I just thought of this one and
>>>>> I like it because it sidesteps the entire exception problem
>>>>> altogether.  The more I think about it, the more I like this one.
>>>>
>>>> Yes, this is also worth trying.
>>>> It seems that there is a get_free_memory() method in
>>>> managed_shared_memory so the amount of free space could be easily
>>>> returned.
>>>> In case if the memory could be fragmented you could try to allocate
>>>> some bigger block and see if it succeeds. Then resize the memory if
>>>> necessary.
>>>> When the exception is thrown get_free_memory() returns 968 in my
>>>> test and I'm starting from 1024 * 64.
>>>> The author of Boost.Interprocess could probably give us some
>>>> pointers about how big the threshold should be and if the
>>>> fragmentation should be taken into account.
>>>>
>>>
>>> Hey!  This worked.  I arbitrarily chose 90% and resized at that
>>> point and my initial tests work!  What I'd like to do know is figure
>>> out how much memory would require.  I'm not terribly worried about
>>> fragmentation (yet), but will get to that after I run some tests
>>> with finding appropriate thresholds.
>>
>> So you probably would also need a size of nodes for the esstimate.
>> Unfortunately it depends on the node type and there is no official
>> way of obtaining it.
>>
>> So again, everything described below is non-official and may be
>> changed in the future.
>>
>> Currently (1.57) Boost.Variant-based nodes are used by default (this
>> may change in the future, in fact I plan to do it at soem point).
>> Static nodes stores varrays/static_vectors, dynamic nodes uses
>> boost::container::vector<>. So in the case of the container::vector<>
>> the ammount of memory would also depend on the allocation strategy of
>> this container. AFAIR this implementation of vector<> allocates only
>> as much as it needs but I'm not sure.
>>
>> For Variant-based static nodes you should be able to check the size
>> like this:
>> sizeof(
>> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_allocator_type::value_type
>> )
>>
>> For dynamic nodes it'd be more complicated since you should add the
>> container::vector<> storage which'd be in the worst case equal to
>> N*sizeof(Value) for leafs and around N*sizeof(std::pair<Box,
>> NodePointer>) for internal nodes. AFAIR in the worst case N should be
>> equal to MAX+1-MIN in the newly created nodes. Box would be of type
>> model::box<model::point<>> using the same coordinate type and
>> dimension as the Indexable. NodePointer would be a type returned by
>> allocator which you could get like this:
>> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_pointer
>>
>>
>>
>> Because of all of this non-official stuff I'd consider it rather a
>> workaround than a solution.
>>
>> Have you thought about the implementation of an allocator managing
>> arbitrary number of Boost.Interprocess allocators/shared files? It
>> could own some number of BI allocators and try to create a block of
>> memory using them and if it wasn't able to do it it could create
>> another file.
>
> I'm hoping that my using of a fixed node type means that I can predict
> the size of the nodes using the space they actually take (by keeping
> track of the allocator's statistics).  So far, the overhead is roughly
> 8 bytes per node over the size of the data stored when I use the
> method above (create vector, use packing algorithm to create an rtree).

Btw, above I wrote about a single-value insert(). If you're using only
pack-create the size could be calculated with more precision, probably
as you're doing already.

Yes, static-size nodes should always take the same amount of memory and
you can get the size of a node using the method described above.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Tim Finer
Hi Adam,

On 11/25/14, 6:04 AM, Adam Wulkiewicz wrote:

> Hi Tim,
>
> Tim Finer wrote:
>> On 11/21/14, 4:59 PM, Adam Wulkiewicz wrote:
>>> Tim Finer wrote:
>>>> It's been a few months, but I finally got back to trying out the
>>>> bgl rtree with a resizable memory mapped file.  Things are looking
>>>> up and I have another question.
>>>>
>>>> I've edited our correspondence down to the essentials:
>>>>
>>>> On 7/26/14, 4:08 AM, Adam Wulkiewicz wrote:
>>>>> Tim Finer wrote:
>>>>>>> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz
>>>>>>> <[hidden email]> wrote:
>>>>>>>
>>>>>>> In order to work the way you want the rtree requires an
>>>>>>> allocator which will be able to increase the size of shared
>>>>>>> memory if needed.
>>>>>>> I'm not sure if this is possible in general and/or with
>>>>>>> Interprocess.
>>>>>>> I'll try to play with it in my free time.
>>>>>>> If you were able to come up with something before me, please let
>>>>>>> me know.
>>>>>> Some other options I'm looking at are:
>>>>>>
>>>>>> 1 Scanning all the data first (it isn't in a convenient format,
>>>>>> so I'll need to store it in another mapped file), then allocating
>>>>>> enough space for the rtree.
>>>>>
>>>>> FYI, the packing algorithm done in the constructor (when you're
>>>>> creating the rtree from e.g. a pair of iterators) should construct
>>>>> an rtree having nodes "packed" as tightly as possible, you could
>>>>> actually predict what'll be the actual number of nodes created. At
>>>>> least for currently used algorithm. If you're curious see line 109
>>>>> of
>>>>> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp.
>>>>> So maybe it'd be a good idea to first load the objects to a
>>>>> vector, calculate their size, then create a shared memory and
>>>>> construct an rtree using pack-create.
>>>>> Though I fear that if the memory could be fragmented this might
>>>>> not work in all cases.
>>>>>
>>>> I want to avoid a first pass and would prefer to trade space for
>>>> time.  Is there a way that I could "actually predict what'll be the
>>>> actual number of nodes created" given a live rtree?  I'd like to do
>>>> this based upon the currently populated rtree so that I could find
>>>> the worst case scenario and allocate at least that much memory in a
>>>> node insertion wrapper.
>>>>
>>>
>>> I think it could be possible. Since 1.56 or 1.57 (I don't remember)
>>> no temporary containers should be created using passed allocator,
>>> only nodes. In the worst case the rtree would have to create 1 leaf
>>> node + N = depth new internal nodes per insert. Unfortunately there
>>> is no official way of checking the depth. You should be able to do
>>> it like this:
>>>
>>> bgi::detail::rtree::utilities::view<rtree_type>(rtree_obj).depth();
>>>
>>> but it's non-official and could be removed/changed in the future.
>> Thank you for the hint - having a non-official function is better
>> than nothing.  I'm using 1.55, and depth() is in there.
>>
>> Loading time is a lot faster when I create a memory mapped vector,
>> insert all the points, then create an rtree with the vector's
>> iterators (using the packing you mentioned before). Not only does
>> this generate a smaller rtree, it also is faster at returning queries!
>>
>
> The above is for one call of a single-value insert().
>
> In the case of packing algorithm you can predict a number of all nodes
> (as I probably wrote earlier), see:
> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp 
>
> line 109.
>

I did read that, except the formula wasn't evident.  I was able to
figure out what I needed to know.

>> Your mention about the implementation change brings up a question:
>> how stable is the internal arrangement of nodes from boost version to
>> version?
>>
>> If I create persistent memory mapped rtrees with version 1.55, will
>> allocators from 1.56 or 1.57 still map them correctly into memory?
>> I'm in the middle of implementing all this so my customers can use as
>> a means of loading very large spatial datasets.  If the memory layout
>> changes (in a breaking way) from boost version to boost version,
>> that'll mean a lot more work that I can live with but only if it is
>> easy to detect and mitigate robustly.
>>
>
> The safest approach would be to never use previously created shared
> files with the new version of the library. Some bugs might be fixed,
> features added, not only in Geometry but also in Interprocess or any
> other library on which it depend, etc. And we're talking about raw
> data here stored in a native format. This isn't serialization.
>
> In 1.55 polymorphic nodes was used, and they didn't work properly with
> shared memory. You shouldn't use this version.
> Since 1.56 variant-based nodes are used.
>
> In the future I plan to replace them with lighter type of nodes, so
> less memory should be needed, but the representation would change.
I see.  I understand that this isn't serialization, but isn't this an
obvious use case?  My understanding is that rtrees work really well as
disk based storage, implying memory mapped files (along with samples
that explain how to do this).  The size of the data sets my customers
have are too large to fit in memory.  It takes several minutes on fast
systems (with SSDs and lots of RAM) to create an rtree of millions of
points.  I don't want to make my users do this every time they open a
document.  My goal was to optimize the data once in an rtree and then
memory map the rtree many times after that.  If boost geometry
rearranges the rtree internally with every release, this makes it much
more difficult to reuse.

Please add a warning in the documentation for the rtree in the memory
mapped samples about this?  I can't be the only one that can't afford to
regenerate rtrees on the fly and is thinking of or attempting to reuse a
memory mapped version.

I see that you actually added experimental serialization support over a
year ago, but it looks like it supports trees than are read completely
into memory, and not read on demand?  I'm more concerned with
persistence than serialization.

Thanks for the heads up about the polymorphic node bug, where is that
documented?  If it isn't, that would also be something really helpful to
include.  I don't know anything about how boost handles this, but it
would be helpful.

As much as I like the performance of the bg rtree, I'll probably switch
to an rtree implementation that has persistence as part of its design.  
Boost isn't written so that different versions can easily be used by an
application, so this doubly compounds the problem.


>
>>>
>>>>>
>>>>>>        3. Monitor the capacity of the shared file and proactively
>>>>>> resize given a threshold (say 90%). I just thought of this one
>>>>>> and I like it because it sidesteps the entire exception problem
>>>>>> altogether.  The more I think about it, the more I like this one.
>>>>>
>>>>> Yes, this is also worth trying.
>>>>> It seems that there is a get_free_memory() method in
>>>>> managed_shared_memory so the amount of free space could be easily
>>>>> returned.
>>>>> In case if the memory could be fragmented you could try to
>>>>> allocate some bigger block and see if it succeeds. Then resize the
>>>>> memory if necessary.
>>>>> When the exception is thrown get_free_memory() returns 968 in my
>>>>> test and I'm starting from 1024 * 64.
>>>>> The author of Boost.Interprocess could probably give us some
>>>>> pointers about how big the threshold should be and if the
>>>>> fragmentation should be taken into account.
>>>>>
>>>>
>>>> Hey!  This worked.  I arbitrarily chose 90% and resized at that
>>>> point and my initial tests work!  What I'd like to do know is
>>>> figure out how much memory would require.  I'm not terribly worried
>>>> about fragmentation (yet), but will get to that after I run some
>>>> tests with finding appropriate thresholds.
>>>
>>> So you probably would also need a size of nodes for the esstimate.
>>> Unfortunately it depends on the node type and there is no official
>>> way of obtaining it.
>>>
>>> So again, everything described below is non-official and may be
>>> changed in the future.
>>>
>>> Currently (1.57) Boost.Variant-based nodes are used by default (this
>>> may change in the future, in fact I plan to do it at soem point).
>>> Static nodes stores varrays/static_vectors, dynamic nodes uses
>>> boost::container::vector<>. So in the case of the
>>> container::vector<> the ammount of memory would also depend on the
>>> allocation strategy of this container. AFAIR this implementation of
>>> vector<> allocates only as much as it needs but I'm not sure.
>>>
>>> For Variant-based static nodes you should be able to check the size
>>> like this:
>>> sizeof(
>>> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_allocator_type::value_type
>>> )
>>>
>>> For dynamic nodes it'd be more complicated since you should add the
>>> container::vector<> storage which'd be in the worst case equal to
>>> N*sizeof(Value) for leafs and around N*sizeof(std::pair<Box,
>>> NodePointer>) for internal nodes. AFAIR in the worst case N should
>>> be equal to MAX+1-MIN in the newly created nodes. Box would be of
>>> type model::box<model::point<>> using the same coordinate type and
>>> dimension as the Indexable. NodePointer would be a type returned by
>>> allocator which you could get like this:
>>> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_pointer
>>>
>>>
>>>
>>> Because of all of this non-official stuff I'd consider it rather a
>>> workaround than a solution.
>>>
>>> Have you thought about the implementation of an allocator managing
>>> arbitrary number of Boost.Interprocess allocators/shared files? It
>>> could own some number of BI allocators and try to create a block of
>>> memory using them and if it wasn't able to do it it could create
>>> another file.
>>
>> I'm hoping that my using of a fixed node type means that I can
>> predict the size of the nodes using the space they actually take (by
>> keeping track of the allocator's statistics).  So far, the overhead
>> is roughly 8 bytes per node over the size of the data stored when I
>> use the method above (create vector, use packing algorithm to create
>> an rtree).
>
> Btw, above I wrote about a single-value insert(). If you're using only
> pack-create the size could be calculated with more precision, probably
> as you're doing already.
>
> Yes, static-size nodes should always take the same amount of memory
> and you can get the size of a node using the method described above.
>
Thank You,
Tim
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Adam Wulkiewicz
In reply to this post by Adam Wulkiewicz
Hi Tim,

2014-11-25 15:04 GMT+01:00 Adam Wulkiewicz <[hidden email]>:

In 1.55 polymorphic nodes was used, and they didn't work properly with shared memory. You shouldn't use this version.

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Adam Wulkiewicz
In reply to this post by Tim Finer
Hi Tim,

Tim Finer wrote:

> On 11/25/14, 6:04 AM, Adam Wulkiewicz wrote:
>> Tim Finer wrote:
>>> Your mention about the implementation change brings up a question:
>>> how stable is the internal arrangement of nodes from boost version
>>> to version?
>>>
>>> If I create persistent memory mapped rtrees with version 1.55, will
>>> allocators from 1.56 or 1.57 still map them correctly into memory?
>>> I'm in the middle of implementing all this so my customers can use
>>> as a means of loading very large spatial datasets.  If the memory
>>> layout changes (in a breaking way) from boost version to boost
>>> version, that'll mean a lot more work that I can live with but only
>>> if it is easy to detect and mitigate robustly.
>>>
>>
>> The safest approach would be to never use previously created shared
>> files with the new version of the library. Some bugs might be fixed,
>> features added, not only in Geometry but also in Interprocess or any
>> other library on which it depend, etc. And we're talking about raw
>> data here stored in a native format. This isn't serialization.
>>
>> In 1.55 polymorphic nodes was used, and they didn't work properly
>> with shared memory. You shouldn't use this version.
>> Since 1.56 variant-based nodes are used.
>>
>> In the future I plan to replace them with lighter type of nodes, so
>> less memory should be needed, but the representation would change.
> I see.  I understand that this isn't serialization, but isn't this an
> obvious use case?  My understanding is that rtrees work really well as
> disk based storage, implying memory mapped files (along with samples
> that explain how to do this).  The size of the data sets my customers
> have are too large to fit in memory.  It takes several minutes on fast
> systems (with SSDs and lots of RAM) to create an rtree of millions of
> points.  I don't want to make my users do this every time they open a
> document.  My goal was to optimize the data once in an rtree and then
> memory map the rtree many times after that.  If boost geometry
> rearranges the rtree internally with every release, this makes it much
> more difficult to reuse.
>
> Please add a warning in the documentation for the rtree in the memory
> mapped samples about this?  I can't be the only one that can't afford
> to regenerate rtrees on the fly and is thinking of or attempting to
> reuse a memory mapped version.

Of course you're right. The R-tree was designed to use persistent
storage, e.g. as an index in a database and this implementation should
certainly allow this.

Boost.Interprocess allows to work around the lack of explicit
persistence but it's very raw/native mechanism. Its main purpose is to
allow sharing data between different processes. The memory is mapped
directly and AFAIU we shouldn't expect that this mapping will not change
in another version of Boost. What if some internals of Interprocess are
optimized and some additional data are stored in a header of such shared
memory?

Do you expect to release new version of your program and bump used Boost
version so often that this is really a problem?
I mean, your users would have to rebuild the tree only if needed.

>
> I see that you actually added experimental serialization support over
> a year ago, but it looks like it supports trees than are read
> completely into memory, and not read on demand?  I'm more concerned
> with persistence than serialization.

Yes, serialization's purpose is to save the whole content to disk or to
load it back. It's not related with the live storage.

Btw, it could e.g. be used along with raw mapped file storage as an
intermediate format for the transition from one Boost version to
another. Of course if the serialization support was finished which isn't
the case.

>
> Thanks for the heads up about the polymorphic node bug, where is that
> documented?  If it isn't, that would also be something really helpful
> to include.  I don't know anything about how boost handles this, but
> it would be helpful.
>

If you're asking about an info about the change of an internal structure
it isn't mentioned anywhere. There is only an info about fixing a bug
with Interprocess.

I dissagree that this should be mentioned in the docs, it's an internal
change. Furthermore AFAIU Interprocess doesn't guarantee that the
representation of data will be the same in various versions of Boost nor
it wasn't designed to support versioning so we shouldn't rely on this.
Btw, Serialization supports versioning.

> As much as I like the performance of the bg rtree, I'll probably
> switch to an rtree implementation that has persistence as part of its
> design.  Boost isn't written so that different versions can easily be
> used by an application, so this doubly compounds the problem.

Sure, everything that works for you. Thanks for your interest and
suggestions. Maybe some day it'll meet your needs.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Tim Finer
Hi Adam,

On 11/25/14, 4:25 PM, Adam Wulkiewicz wrote:
> Hi Tim,
>
(edited)

> Of course you're right. The R-tree was designed to use persistent
> storage, e.g. as an index in a database and this implementation should
> certainly allow this.
>
> Boost.Interprocess allows to work around the lack of explicit
> persistence but it's very raw/native mechanism. Its main purpose is to
> allow sharing data between different processes. The memory is mapped
> directly and AFAIU we shouldn't expect that this mapping will not
> change in another version of Boost. What if some internals of
> Interprocess are optimized and some additional data are stored in a
> header of such shared memory?
>
> Do you expect to release new version of your program and bump used
> Boost version so often that this is really a problem?
> I mean, your users would have to rebuild the tree only if needed.

All fair points.  I'm still thinking pretty hard about this.  We don't
update often, but the troubles happen in both directions (users updating
to new versions, users sharing files with older versions).

(edited)
> If you're asking about an info about the change of an internal
> structure it isn't mentioned anywhere. There is only an info about
> fixing a bug with Interprocess.
>
> I dissagree that this should be mentioned in the docs, it's an
> internal change. Furthermore AFAIU Interprocess doesn't guarantee that
> the representation of data will be the same in various versions of
> Boost nor it wasn't designed to support versioning so we shouldn't
> rely on this. Btw, Serialization supports versioning.
Sure, I understand that there are no guarantees between versions.  I
might've misunderstood your warning though.  I thought you were advising
not to use 1.55 IPC and bg rtree in general (without updating) because
of an internal polymorphic node?  If so, then I don't see why the
documentation with the sample shouldn't be edited to say something like
"don't use this until 1.56" or something.

But if you were talking about files saved with 1.55 not working with
1.56, then I agree there's no reason to document that.

>
>> As much as I like the performance of the bg rtree, I'll probably
>> switch to an rtree implementation that has persistence as part of its
>> design.  Boost isn't written so that different versions can easily be
>> used by an application, so this doubly compounds the problem.
>
> Sure, everything that works for you. Thanks for your interest and
> suggestions. Maybe some day it'll meet your needs.

Best,
Tim


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Adam Wulkiewicz
Hi Tim,

Tim Finer wrote:

> On 11/25/14, 4:25 PM, Adam Wulkiewicz wrote:
>> If you're asking about an info about the change of an internal
>> structure it isn't mentioned anywhere. There is only an info about
>> fixing a bug with Interprocess.
>>
>> I dissagree that this should be mentioned in the docs, it's an
>> internal change. Furthermore AFAIU Interprocess doesn't guarantee
>> that the representation of data will be the same in various versions
>> of Boost nor it wasn't designed to support versioning so we shouldn't
>> rely on this. Btw, Serialization supports versioning.
> Sure, I understand that there are no guarantees between versions. I
> might've misunderstood your warning though.  I thought you were
> advising not to use 1.55 IPC and bg rtree in general (without
> updating) because of an internal polymorphic node?  If so, then I
> don't see why the documentation with the sample shouldn't be edited to
> say something like "don't use this until 1.56" or something.

I was advising not to use 1.54 or 1.55 with Interprocess. It's mentioned
in the release notes for 1.56
(http://www.boost.org/doc/libs/1_57_0/libs/geometry/doc/html/geometry/release_notes.html#geometry.release_notes.boost_1_56),
in Bugfixes section: "rtree crashed in some cases when used with
Interprocess allocator".

Documentation is built and redistributed with Boost, a copy is placed on
a webpage. It's not possible to add an info in docs of already released
versions. To do it we probably would be forced to release e.g. version
1.xx.1 and nobody would agree to do that for adding a message in the
docs, and after a release period. Not to mention that there are already
2 newer versions of Boost released. So this info could be added in 1.58
but wouldn't be visible in the documentation for older versions. It
probably isn't a good solution since users of 1.58 would just ignore it,
and users of 1.55 wouldn't see it.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55

Tim Finer
Hi Adam,

On 11/26/14, 3:33 AM, Adam Wulkiewicz wrote:

> Hi Tim,
>
> Tim Finer wrote:
>> On 11/25/14, 4:25 PM, Adam Wulkiewicz wrote:
>>> If you're asking about an info about the change of an internal
>>> structure it isn't mentioned anywhere. There is only an info about
>>> fixing a bug with Interprocess.
>>>
>>> I dissagree that this should be mentioned in the docs, it's an
>>> internal change. Furthermore AFAIU Interprocess doesn't guarantee
>>> that the representation of data will be the same in various versions
>>> of Boost nor it wasn't designed to support versioning so we
>>> shouldn't rely on this. Btw, Serialization supports versioning.
>> Sure, I understand that there are no guarantees between versions. I
>> might've misunderstood your warning though.  I thought you were
>> advising not to use 1.55 IPC and bg rtree in general (without
>> updating) because of an internal polymorphic node?  If so, then I
>> don't see why the documentation with the sample shouldn't be edited
>> to say something like "don't use this until 1.56" or something.
>
> I was advising not to use 1.54 or 1.55 with Interprocess. It's
> mentioned in the release notes for 1.56
> (http://www.boost.org/doc/libs/1_57_0/libs/geometry/doc/html/geometry/release_notes.html#geometry.release_notes.boost_1_56),
> in Bugfixes section: "rtree crashed in some cases when used with
> Interprocess allocator".
>
> Documentation is built and redistributed with Boost, a copy is placed
> on a webpage. It's not possible to add an info in docs of already
> released versions. To do it we probably would be forced to release
> e.g. version 1.xx.1 and nobody would agree to do that for adding a
> message in the docs, and after a release period. Not to mention that
> there are already 2 newer versions of Boost released. So this info
> could be added in 1.58 but wouldn't be visible in the documentation
> for older versions. It probably isn't a good solution since users of
> 1.58 would just ignore it, and users of 1.55 wouldn't see it.

Ah, OK.  Thanks for the clarification.  I appreciate all your patience.

Best,
Tim
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry