How can I Help?

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

How can I Help?

Henry Roeland

Dear all,

First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
Technics that we use are C/C++, Sqlite, spatialite and of corse boost.

Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.

Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?

I started studying your code and busy generating uml diagram of your code to get the big picture.

One remark is that I have to help in my own/spare time.

Hope to hear from you.

Kind regards,

Henry Roeland


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: How can I Help?

Adam Wulkiewicz
Hi Henry,

Henry Roeland wrote:
> Dear all,
>
> First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
> I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
> Technics that we use are C/C++, Sqlite, spatialite and of corse boost.
>
> Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.
>
> Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?

Certainly not the only one. People are asking about it from time to
time, using Interprocess' mapped file as a replacement. You're the only
one who is willing to help with the implementation. I greately
appreciate it.


As I mentioned before I was thinking about a stateful
allocator/storage/manager handling file or files/pages, cacheing etc.
This allocator would return pointers which when dereferenced would force
the allocator to load and cache data. Then, there would be a mechanism
to tell the allocator that some pointers/data aren't used anymore. It
could be done explicitly by calling some function on allocator like:

persistent_allocator alloc(...);
rtree<..., persistent_allocator> rt(..., alloc);

rt.insert(...);
rt.insert(...);
rt.insert(...);
alloc.flush();

The next step could be adding of some hooks or traits for allocators
which would be called by the rtree internally on the end of some
operation, e.g. insert. So this wouldn't be a C++ allocator anymore
since it'd have additional methods.

To avoid storing a reference to allocator/storage in those smart
pointers the rtree could notify the allocator/storage when a pointer is
dereferenced and data is required. So the responsibility for the
notification of allocator/storage would be moved from a pointer to the
rtree.

Finally the rtree could notify the allocator/storage each time the data
owned by a pointer is no longer needed but this probably wouldn't change
anything. Furthermore modifying and releasing the nodes before an
operation is finished wouldn't be safe in case when an exception was
thrown. In fact, this allocator should keep a copy of nodes in memory
during a whole operation and then at the end of the operation save the
data into a persistent storage.


The next thing is how to serialize data from and to node. We should ask
ourselves should we allow the users to implement other persistent
storage variants on their own. If the answer was no, then we could just
internally in the persistent allocator use the internals of the rtree.
But if we wanted to allow it (which IMO should be done) the most
straightforward way would be to expose the interface used by the rtree
internally to access node's elements. Exposing it would mean that we
couldn't change it in the future, so I'd prefer to avoid it. Instead, we
could mimic or directly use Boost.Serialization interface for Nodes. In
this scenario a specific Node would know what data must be saved and
loaded and it'd be able to save or load to/from some generic Archive.
Depending on the needs this could be some temporary 1-node Archive
gathering data for later or a target Archive capable to serialize data
on the fly, it'd depend on an allocator/storage.

This way we could also support versioning on a node level, the same way
how it's done in Serialization. So changes would have to be done on a
nodes level not in the user-defined allocator. An example could be an
implementation of a rtree variant storing additional data in nodes
(hilbert r-tree) or additional types of nodes (PRtree). Also arbitrary
user-defined Values would be serialized the same way (using
Serialization or familiar Serialization-like interface).

And this way we'd also support Serialization in one go.


We'd probably need some file header with the parameters of an rtree both
in persistent storage and in serialization support. Similar as with
nodes, an rtree could know how to load/save the header from/to some
Archive. The rtree should e.g. check if it's possible to use persistent
storage and load data, e.g. if it wasn't created using incompatible
parameters, etc.


So this is how I see it. Of course it isn't a plan, rather an idea. Feel
free to point out any errors or present your ideas.

> I started studying your code and busy generating uml diagram of your code to get the big picture.

Ok, if you have any questions feel free to ask. Maybe I'll do some
introduction about the internals.

1. Nodes

The rtree handles nodes using visitor pattern. It's more or less how
Boost.Variant and static_visitor works.
Currently there are 2 types of nodes, internal nodes and leafs. Internal
nodes store a container of Box, pointer pairs, leafs store Values of
type passed as the 1st rtree template parameter.
Furthermore nodes can store static-size or dynamic-size containers. The
kind is identified by a tag.

For each kind there is a template allocators<> storing allocator objects
needed to construct internal nodes and leafs.
There are also specializations of utilities create_node<> and
destroy_node<> implementing creatin and destruction of nodes.

Everything related to nodes is here:
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/node

2. Algorithms

Nearly all operations are implemented using visitor pattern: insertion,
removal, query, copy, destruction
(https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/visitors).

The simplest one is a visitor checking if a node is internal node or a
leaf:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/visitors/is_leaf.hpp
The most complex operation is insertion.

3. Insertion

bgi::detail::rtree::visitors::insert<> allows to insert a Value or an
Element on a desired level of a tree (there are 2 specializations). See:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/visitors/insert.hpp
line 403 and below.

The flow of an original R-tree balancing algorithm was slightly changed
to allow nice decomposition of functional parts.
At each level of traversal:
1. For internal node
   1.1. bgi::detail::rtree::choose_next_node<> algorithm is called
(default implementation in line 25 of visitors/insert.hpp)
   1.2. the tree is traversed using the choosen node
2. For leaf a new Value is added
3. If there is an overflow (too many elements)
   3.1 a bgi::detail::rtree::split<> algorithm is called (default in
line 109 of visitors/insert.hpp)
   3.2 split algo creates a new node and...
   3.3 bgi::detail::rtree::redistribute_elements<> is called, it
redistrbutes contained elements between nodes

Depending on parameters various algorithms are used, each rtree type may
specialize its own algorithms.
In fact all rtree variants define different redistribute_elements<>.
linear and quadratic rtrees uses the default insert<> and
choose_next_node<>.
R*tree specializes also choose_next_node<> and insert<>.

Each variant has its own directory:
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/linear
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/quadratic
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/rstar

4. Parameters and options

All of the above, nodes and algorithms are identified by tags (using
tag-dispatching technique).
Binding of rtree parameters with tags is done in
bgi::detail::rtree::options<> and can be seen here:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/options.hpp

5. Pack create

Plus there is also 1 packing algorithm creating the rtree in a top-down
manner here:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp
It works more or less like a classic top-down kd-tree creation algorithm
using object median split.

> One remark is that I have to help in my own/spare time.

Sure, no pressure. I'm working on the rtree in my free time as well and
I appreciate any help :)

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: How can I Help?

Henry Roeland
Hi Adam,

Thanks for quick and helpful response.


> On 26 nov. 2014, at 17:48, Adam Wulkiewicz <[hidden email]> wrote:
>
> Hi Henry,
>
> Henry Roeland wrote:
>> Dear all,
>>
>> First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
>> I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
>> Technics that we use are C/C++, Sqlite, spatialite and of corse boost.
>>
>> Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.
>>
>> Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?
>
> Certainly not the only one. People are asking about it from time to time, using Interprocess' mapped file as a replacement. You're the only one who is willing to help with the implementation. I greately appreciate it.
>
>
> As I mentioned before I was thinking about a stateful allocator/storage/manager handling file or files/pages, cacheing etc. This allocator would return pointers which when dereferenced would force the allocator to load and cache data. Then, there would be a mechanism to tell the allocator that some pointers/data aren't used anymore. It could be done explicitly by calling some function on allocator like:
>
> persistent_allocator alloc(...);
> rtree<..., persistent_allocator> rt(..., alloc);
>
> rt.insert(...);
> rt.insert(...);
> rt.insert(...);
> alloc.flush();
>

I don’t know about an rtree but from my homework:-) concerning b-trees I understood that pages can be handled as nodes themselves. Pages can be split and contain multiple nodes depending on size. If we could create a stable interface for Page(Nodes) then this interface can become public leaving the Node interface hidden???
It can also have other advantages to alloc a group of nodes (Page) together in terms of memory alignment(Your don’t need a smart pointer for every node only for the page?), Also in terms of (de)serialisation and bulk load handling a group of nodes would IMHO be better (if its possible).  




> The next step could be adding of some hooks or traits for allocators which would be called by the rtree internally on the end of some operation, e.g. insert. So this wouldn't be a C++ allocator anymore since it'd have additional methods.
>
> To avoid storing a reference to allocator/storage in those smart pointers the rtree could notify the allocator/storage when a pointer is dereferenced and data is required. So the responsibility for the notification of allocator/storage would be moved from a pointer to the rtree.
>

The Page Node can administrate its size and its free size and have a smart_pointer telling when it was last used and dependency_checker /unload checker with responsibility for notification. Maybe we can use Boost Signals2 for notification?


Together with a Page/Storage manager I’m thinking about the following points:
- Policy for when to load/unload. traits as you point it out.
- Size of rtree must be known at all times. If some limit is reached start unloading (allocator can tell the size?) to bottom pages that are less used?
- 25%/50%/75% percent of the top of the tree must remain in memory for speed???
- define 3 actions/states for a page: loaded, unloaded, unloaded but cached as memory map.????


> Finally the rtree could notify the allocator/storage each time the data owned by a pointer is no longer needed but this probably wouldn't change anything. Furthermore modifying and releasing the nodes before an operation is finished wouldn't be safe in case when an exception was thrown. In fact, this allocator should keep a copy of nodes in memory during a whole operation and then at the end of the operation save the data into a persistent storage.
>
>
> The next thing is how to serialize data from and to node. We should ask ourselves should we allow the users to implement other persistent storage variants on their own. If the answer was no, then we could just internally in the persistent allocator use the internals of the rtree. But if we wanted to allow it (which IMO should be done) the most straightforward way would be to expose the interface used by the rtree internally to access node's elements. Exposing it would mean that we couldn't change it in the future, so I'd prefer to avoid it. Instead, we could mimic or directly use Boost.Serialization interface for Nodes. In this scenario a specific Node would know what data must be saved and loaded and it'd be able to save or load to/from some generic Archive. Depending on the needs this could be some temporary 1-node Archive gathering data for later or a target Archive capable to serialize data on the fly, it'd depend on an allocator/storage.
>
> This way we could also support versioning on a node level, the same way how it's done in Serialization. So changes would have to be done on a nodes level not in the user-defined allocator. An example could be an implementation of a rtree variant storing additional data in nodes (hilbert r-tree) or additional types of nodes (PRtree). Also arbitrary user-defined Values would be serialized the same way (using Serialization or familiar Serialization-like interface).
>
> And this way we'd also support Serialization in one go.
>
>
> We'd probably need some file header with the parameters of an rtree both in persistent storage and in serialization support. Similar as with nodes, an rtree could know how to load/save the header from/to some Archive. The rtree should e.g. check if it's possible to use persistent storage and load data, e.g. if it wasn't created using incompatible parameters, etc.
>
>

How is it with backwards compatibility? I can imagine that Storage work can break something already running?


If I’m talking rubbish just tell me I can handle it:-)


Thanks for beneath info to getting me started. I will have to read it a few times to understand it all.

Kind regards,

Henry


> So this is how I see it. Of course it isn't a plan, rather an idea. Feel free to point out any errors or present your ideas.
>
>> I started studying your code and busy generating uml diagram of your code to get the big picture.
>
> Ok, if you have any questions feel free to ask. Maybe I'll do some introduction about the internals.
>
> 1. Nodes
>
> The rtree handles nodes using visitor pattern. It's more or less how Boost.Variant and static_visitor works.
> Currently there are 2 types of nodes, internal nodes and leafs. Internal nodes store a container of Box, pointer pairs, leafs store Values of type passed as the 1st rtree template parameter.
> Furthermore nodes can store static-size or dynamic-size containers. The kind is identified by a tag.
>
> For each kind there is a template allocators<> storing allocator objects needed to construct internal nodes and leafs.
> There are also specializations of utilities create_node<> and destroy_node<> implementing creatin and destruction of nodes.
>
> Everything related to nodes is here: https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/node
>
> 2. Algorithms
>
> Nearly all operations are implemented using visitor pattern: insertion, removal, query, copy, destruction (https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/visitors).
>
> The simplest one is a visitor checking if a node is internal node or a leaf: https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/visitors/is_leaf.hpp
> The most complex operation is insertion.
>
> 3. Insertion
>
> bgi::detail::rtree::visitors::insert<> allows to insert a Value or an Element on a desired level of a tree (there are 2 specializations). See:
> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/visitors/insert.hpp
> line 403 and below.
>
> The flow of an original R-tree balancing algorithm was slightly changed to allow nice decomposition of functional parts.
> At each level of traversal:
> 1. For internal node
>  1.1. bgi::detail::rtree::choose_next_node<> algorithm is called (default implementation in line 25 of visitors/insert.hpp)
>  1.2. the tree is traversed using the choosen node
> 2. For leaf a new Value is added
> 3. If there is an overflow (too many elements)
>  3.1 a bgi::detail::rtree::split<> algorithm is called (default in line 109 of visitors/insert.hpp)
>  3.2 split algo creates a new node and...
>  3.3 bgi::detail::rtree::redistribute_elements<> is called, it redistrbutes contained elements between nodes
>
> Depending on parameters various algorithms are used, each rtree type may specialize its own algorithms.
> In fact all rtree variants define different redistribute_elements<>.
> linear and quadratic rtrees uses the default insert<> and choose_next_node<>.
> R*tree specializes also choose_next_node<> and insert<>.
>
> Each variant has its own directory:
> https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/linear
> https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/quadratic
> https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/rstar
>
> 4. Parameters and options
>
> All of the above, nodes and algorithms are identified by tags (using tag-dispatching technique).
> Binding of rtree parameters with tags is done in bgi::detail::rtree::options<> and can be seen here:
> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/options.hpp
>
> 5. Pack create
>
> Plus there is also 1 packing algorithm creating the rtree in a top-down manner here:
> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp
> It works more or less like a classic top-down kd-tree creation algorithm using object median split.
>
>> One remark is that I have to help in my own/spare time.
>
> Sure, no pressure. I'm working on the rtree in my free time as well and I appreciate any help :)
>
> Regards,
> Adam
> _______________________________________________
> Geometry mailing list
> [hidden email]
> http://lists.boost.org/mailman/listinfo.cgi/geometry

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: How can I Help?

Adam Wulkiewicz
Hi Henry,

Henry Roeland wrote:

>
>> On 26 nov. 2014, at 17:48, Adam Wulkiewicz <[hidden email]> wrote:
>>
>> Henry Roeland wrote:
>>> Dear all,
>>>
>>> First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
>>> I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
>>> Technics that we use are C/C++, Sqlite, spatialite and of corse boost.
>>>
>>> Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.
>>>
>>> Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?
>> Certainly not the only one. People are asking about it from time to time, using Interprocess' mapped file as a replacement. You're the only one who is willing to help with the implementation. I greately appreciate it.
>>
>>
>> As I mentioned before I was thinking about a stateful allocator/storage/manager handling file or files/pages, cacheing etc. This allocator would return pointers which when dereferenced would force the allocator to load and cache data. Then, there would be a mechanism to tell the allocator that some pointers/data aren't used anymore. It could be done explicitly by calling some function on allocator like:
>>
>> persistent_allocator alloc(...);
>> rtree<..., persistent_allocator> rt(..., alloc);
>>
>> rt.insert(...);
>> rt.insert(...);
>> rt.insert(...);
>> alloc.flush();
>>
> I don’t know about an rtree but from my homework:-) concerning b-trees I understood that pages can be handled as nodes themselves. Pages can be split and contain multiple nodes depending on size. If we could create a stable interface for Page(Nodes) then this interface can become public leaving the Node interface hidden???
> It can also have other advantages to alloc a group of nodes (Page) together in terms of memory alignment(Your don’t need a smart pointer for every node only for the page?), Also in terms of (de)serialisation and bulk load handling a group of nodes would IMHO be better (if its possible).

Of course you're right, nodes could (or even should) be grouped in
pages, loaded together etc. Also we should probably group close nodes
together in one page (e.g. parents with children), allocator hint could
be used for this. But I was thinking about leaving the desicion how to
store nodes to the allocator/storage. As you said one could use big
pages, one could store 1 node per file, or a database etc. And still
somehow a page or some other storage or Archive must know what data
should be saved/loaded per node. In my scenario this would just be
implemented in a Serialization-like way. Higher level aspects like pages
would be closed in allocator/storage. Of course assuming that this could
be done this way but I feel that it could.

I think that a pointer still would have to store some ID of a node
inside a page to identify an exact placement of the data. Or at least
some global ID somehow indexed to access a specific node in a specific page.

>
>> The next step could be adding of some hooks or traits for allocators which would be called by the rtree internally on the end of some operation, e.g. insert. So this wouldn't be a C++ allocator anymore since it'd have additional methods.
>>
>> To avoid storing a reference to allocator/storage in those smart pointers the rtree could notify the allocator/storage when a pointer is dereferenced and data is required. So the responsibility for the notification of allocator/storage would be moved from a pointer to the rtree.
>>
> The Page Node can administrate its size and its free size and have a smart_pointer telling when it was last used and dependency_checker /unload checker with responsibility for notification. Maybe we can use Boost Signals2 for notification?
>
>
> Together with a Page/Storage manager I’m thinking about the following points:
> - Policy for when to load/unload. traits as you point it out.
> - Size of rtree must be known at all times. If some limit is reached start unloading (allocator can tell the size?) to bottom pages that are less used?
> - 25%/50%/75% percent of the top of the tree must remain in memory for speed???
> - define 3 actions/states for a page: loaded, unloaded, unloaded but cached as memory map.????

So what you're describing here is how a paging allocator/storage could
work. Sure if the storage was notified by the rtree about the performed
actions it'd be able to do whatever it needs, cache pages, store some
kind of usage counter to know which pages are used most often, support
various policies, etc.

I was describing more low level stuff that would have to be implemented
on the rtree side. Basically 2 things notifications for actions and an
interface for serialization of nodes and rtree header. Or would you
prefer if I handled it, that you could focus on the implementation of a
specific allocator/storage?

There is also another thing that is mentioned in another thread (started
by Tim). There should be some way of creation non-rtree, temporary
pages, e.g. for data stored in a vector or some other container designed
specifically for this purpose. But this could be handled later.

>
>> Finally the rtree could notify the allocator/storage each time the data owned by a pointer is no longer needed but this probably wouldn't change anything. Furthermore modifying and releasing the nodes before an operation is finished wouldn't be safe in case when an exception was thrown. In fact, this allocator should keep a copy of nodes in memory during a whole operation and then at the end of the operation save the data into a persistent storage.
>>
>>
>> The next thing is how to serialize data from and to node. We should ask ourselves should we allow the users to implement other persistent storage variants on their own. If the answer was no, then we could just internally in the persistent allocator use the internals of the rtree. But if we wanted to allow it (which IMO should be done) the most straightforward way would be to expose the interface used by the rtree internally to access node's elements. Exposing it would mean that we couldn't change it in the future, so I'd prefer to avoid it. Instead, we could mimic or directly use Boost.Serialization interface for Nodes. In this scenario a specific Node would know what data must be saved and loaded and it'd be able to save or load to/from some generic Archive. Depending on the needs this could be some temporary 1-node Archive gathering data for later or a target Archive capable to serialize data on the fly, it'd depend on an allocator/storage.
>>
>> This way we could also support versioning on a node level, the same way how it's done in Serialization. So changes would have to be done on a nodes level not in the user-defined allocator. An example could be an implementation of a rtree variant storing additional data in nodes (hilbert r-tree) or additional types of nodes (PRtree). Also arbitrary user-defined Values would be serialized the same way (using Serialization or familiar Serialization-like interface).
>>
>> And this way we'd also support Serialization in one go.
>>
>>
>> We'd probably need some file header with the parameters of an rtree both in persistent storage and in serialization support. Similar as with nodes, an rtree could know how to load/save the header from/to some Archive. The rtree should e.g. check if it's possible to use persistent storage and load data, e.g. if it wasn't created using incompatible parameters, etc.
>>
>>
> How is it with backwards compatibility? I can imagine that Storage work can break something already running?

I think that this won't be problematic.

> If I’m talking rubbish just tell me I can handle it:-)
>

All ideas are valuable :)

> Thanks for beneath info to getting me started. I will have to read it a few times to understand it all.

If you have any questions just ask. If I handled it on the rtree side
and you was working on the allocator/storage, you probably wouldn't even
be forced to know all of this internal stuff. Not that I have something
against it :)

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Storage allocator questions

Henry Roeland
Hi Adam,

It would be greet if I can focus on storage allocator and leave the rtree part to you or somebody else.

I hope I can still ask you guys questions concerning this storage allocator? I have some global Idea’s and probably need help both on design level and implementation.


Questions so far:

1. Is paging the only mechanism to get memory growth under control for an rtree? Its the only one I can think of, but maybe there are other ways to explore?


2. Is there any way an rtree can be read-only? Or set read-only after (bulk)insert?


3. One problem/challenge is how to get and keep region(box) coordinates “divided” over the page nodes. At least thats my understanding of an (r)tree. You need the region coordinates as fast lookup ID to know if a query should “dive” into the page or its sub pages or not. 
Crazy idea: Maybe the allocator should manage an rtree of pages with references/pointers to the “real” tree nodes?

Questions: 
  • Is my understanding of data lookup right? So coordinates determine the location of the data inside the RTree? 
  • If this is true does this then mean that all Pages (Loaded or unloaded) must be known including its region coordinates? 
  • This in order to get/keep the tree balanced and know which page must be loaded (by coordinate query)?
 

Lets discuss the different actions an storage allocator can/must handle:
  • Load in Page data on request by RTree
    • “Load" can mean anything: From File, From DB, From MessageQueue,…
    • “Request" means query hits coordinate region that matches the Page coordinates
  • Bulk Load multiple pages at once???

  • Unload Page data on request by Storage Manager itself(?)
    • “Unload” can mean: 
      • To /dev/null :-) Only useful when read-only rtree
      • To File: Memory mapped or otherwise
      • To DB: Save data to (spatial) database 
      • To MessageQueue
      • ???
    • “Request” can mean:
      • Smart_pointer(s) say that nodes inside page are not used anymore
      • Smart_pointer(s) say that page is not used anymore
      • Amount of usage(hit count) is small
      • ???
  • Bulk Unload multiple pages at once???

NOTE: The storage allocator must, as Adam already pointed out, deliver an interface to make above points possible. But not implement them in anyway.

Other questions/points that come to my mind are: 
  • Must the storage allocator always store? What about transactions between in memory (RTree) and Persistence storage?
  • State-full allocator: Who owns the nodes and the data inside it? I always thought of an allocator like a factory that does not own the data…
  • When data is owned by another STL(like) container then IMHO a storage allocator has no use. This because the storage allocator then does not own the data and has no means to free/unload it.


Above is going top down by requirements/features but code-wise I will try to handle in my next e-mail. 

Thanks for your time,

Kind regards,

Henry Roeland


On 26 nov. 2014, at 22:12, Adam Wulkiewicz <[hidden email]> wrote:

Hi Henry,

Henry Roeland wrote:

On 26 nov. 2014, at 17:48, Adam Wulkiewicz <[hidden email]> wrote:

Henry Roeland wrote:
Dear all,

First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
Technics that we use are C/C++, Sqlite, spatialite and of corse boost.

Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.

Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?
Certainly not the only one. People are asking about it from time to time, using Interprocess' mapped file as a replacement. You're the only one who is willing to help with the implementation. I greately appreciate it.


As I mentioned before I was thinking about a stateful allocator/storage/manager handling file or files/pages, cacheing etc. This allocator would return pointers which when dereferenced would force the allocator to load and cache data. Then, there would be a mechanism to tell the allocator that some pointers/data aren't used anymore. It could be done explicitly by calling some function on allocator like:

persistent_allocator alloc(...);
rtree<..., persistent_allocator> rt(..., alloc);

rt.insert(...);
rt.insert(...);
rt.insert(...);
alloc.flush();

I don’t know about an rtree but from my homework:-) concerning b-trees I understood that pages can be handled as nodes themselves. Pages can be split and contain multiple nodes depending on size. If we could create a stable interface for Page(Nodes) then this interface can become public leaving the Node interface hidden???
It can also have other advantages to alloc a group of nodes (Page) together in terms of memory alignment(Your don’t need a smart pointer for every node only for the page?), Also in terms of (de)serialisation and bulk load handling a group of nodes would IMHO be better (if its possible).

Of course you're right, nodes could (or even should) be grouped in pages, loaded together etc. Also we should probably group close nodes together in one page (e.g. parents with children), allocator hint could be used for this. But I was thinking about leaving the desicion how to store nodes to the allocator/storage. As you said one could use big pages, one could store 1 node per file, or a database etc. And still somehow a page or some other storage or Archive must know what data should be saved/loaded per node. In my scenario this would just be implemented in a Serialization-like way. Higher level aspects like pages would be closed in allocator/storage. Of course assuming that this could be done this way but I feel that it could.

I think that a pointer still would have to store some ID of a node inside a page to identify an exact placement of the data. Or at least some global ID somehow indexed to access a specific node in a specific page.


The next step could be adding of some hooks or traits for allocators which would be called by the rtree internally on the end of some operation, e.g. insert. So this wouldn't be a C++ allocator anymore since it'd have additional methods.

To avoid storing a reference to allocator/storage in those smart pointers the rtree could notify the allocator/storage when a pointer is dereferenced and data is required. So the responsibility for the notification of allocator/storage would be moved from a pointer to the rtree.

The Page Node can administrate its size and its free size and have a smart_pointer telling when it was last used and dependency_checker /unload checker with responsibility for notification. Maybe we can use Boost Signals2 for notification?


Together with a Page/Storage manager I’m thinking about the following points:
- Policy for when to load/unload. traits as you point it out.
- Size of rtree must be known at all times. If some limit is reached start unloading (allocator can tell the size?) to bottom pages that are less used?
- 25%/50%/75% percent of the top of the tree must remain in memory for speed???
- define 3 actions/states for a page: loaded, unloaded, unloaded but cached as memory map.????

So what you're describing here is how a paging allocator/storage could work. Sure if the storage was notified by the rtree about the performed actions it'd be able to do whatever it needs, cache pages, store some kind of usage counter to know which pages are used most often, support various policies, etc.

I was describing more low level stuff that would have to be implemented on the rtree side. Basically 2 things notifications for actions and an interface for serialization of nodes and rtree header. Or would you prefer if I handled it, that you could focus on the implementation of a specific allocator/storage?

There is also another thing that is mentioned in another thread (started by Tim). There should be some way of creation non-rtree, temporary pages, e.g. for data stored in a vector or some other container designed specifically for this purpose. But this could be handled later.


Finally the rtree could notify the allocator/storage each time the data owned by a pointer is no longer needed but this probably wouldn't change anything. Furthermore modifying and releasing the nodes before an operation is finished wouldn't be safe in case when an exception was thrown. In fact, this allocator should keep a copy of nodes in memory during a whole operation and then at the end of the operation save the data into a persistent storage.


The next thing is how to serialize data from and to node. We should ask ourselves should we allow the users to implement other persistent storage variants on their own. If the answer was no, then we could just internally in the persistent allocator use the internals of the rtree. But if we wanted to allow it (which IMO should be done) the most straightforward way would be to expose the interface used by the rtree internally to access node's elements. Exposing it would mean that we couldn't change it in the future, so I'd prefer to avoid it. Instead, we could mimic or directly use Boost.Serialization interface for Nodes. In this scenario a specific Node would know what data must be saved and loaded and it'd be able to save or load to/from some generic Archive. Depending on the needs this could be some temporary 1-node Archive gathering data for later or a target Archive capable to serialize data on the fly, it'd depend on an allocator/storage.

This way we could also support versioning on a node level, the same way how it's done in Serialization. So changes would have to be done on a nodes level not in the user-defined allocator. An example could be an implementation of a rtree variant storing additional data in nodes (hilbert r-tree) or additional types of nodes (PRtree). Also arbitrary user-defined Values would be serialized the same way (using Serialization or familiar Serialization-like interface).

And this way we'd also support Serialization in one go.


We'd probably need some file header with the parameters of an rtree both in persistent storage and in serialization support. Similar as with nodes, an rtree could know how to load/save the header from/to some Archive. The rtree should e.g. check if it's possible to use persistent storage and load data, e.g. if it wasn't created using incompatible parameters, etc.


How is it with backwards compatibility? I can imagine that Storage work can break something already running?

I think that this won't be problematic.

If I’m talking rubbish just tell me I can handle it:-)


All ideas are valuable :)

Thanks for beneath info to getting me started. I will have to read it a few times to understand it all.

If you have any questions just ask. If I handled it on the rtree side and you was working on the allocator/storage, you probably wouldn't even be forced to know all of this internal stuff. Not that I have something against it :)

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Storage allocator questions

Tim Finer
Hi Henry,

On 11/29/14, 10:48 AM, Henry Roeland wrote:
Hi Adam,

It would be greet if I can focus on storage allocator and leave the rtree part to you or somebody else.

I hope I can still ask you guys questions concerning this storage allocator? I have some global Idea’s and probably need help both on design level and implementation.


Questions so far:

1. Is paging the only mechanism to get memory growth under control for an rtree? Its the only one I can think of, but maybe there are other ways to explore?


2. Is there any way an rtree can be read-only? Or set read-only after (bulk)insert?


3. One problem/challenge is how to get and keep region(box) coordinates “divided” over the page nodes. At least thats my understanding of an (r)tree. You need the region coordinates as fast lookup ID to know if a query should “dive” into the page or its sub pages or not. 
Crazy idea: Maybe the allocator should manage an rtree of pages with references/pointers to the “real” tree nodes?

Questions: 
  • Is my understanding of data lookup right? So coordinates determine the location of the data inside the RTree? 
  • If this is true does this then mean that all Pages (Loaded or unloaded) must be known including its region coordinates? 
  • This in order to get/keep the tree balanced and know which page must be loaded (by coordinate query)?
 

Lets discuss the different actions an storage allocator can/must handle:
  • Load in Page data on request by RTree
    • “Load" can mean anything: From File, From DB, From MessageQueue,…
    • “Request" means query hits coordinate region that matches the Page coordinates
  • Bulk Load multiple pages at once???

  • Unload Page data on request by Storage Manager itself(?)
    • “Unload” can mean: 
      • To /dev/null :-) Only useful when read-only rtree
      • To File: Memory mapped or otherwise
      • To DB: Save data to (spatial) database 
      • To MessageQueue
      • ???
    • “Request” can mean:
      • Smart_pointer(s) say that nodes inside page are not used anymore
      • Smart_pointer(s) say that page is not used anymore
      • Amount of usage(hit count) is small
      • ???
  • Bulk Unload multiple pages at once???

NOTE: The storage allocator must, as Adam already pointed out, deliver an interface to make above points possible. But not implement them in anyway.

Other questions/points that come to my mind are: 
  • Must the storage allocator always store? What about transactions between in memory (RTree) and Persistence storage?
  • State-full allocator: Who owns the nodes and the data inside it? I always thought of an allocator like a factory that does not own the data…
  • When data is owned by another STL(like) container then IMHO a storage allocator has no use. This because the storage allocator then does not own the data and has no means to free/unload it.


Above is going top down by requirements/features but code-wise I will try to handle in my next e-mail. 

Thanks for your time,

Kind regards,

Henry Roeland


On 26 nov. 2014, at 22:12, Adam Wulkiewicz <[hidden email]> wrote:

Hi Henry,

Henry Roeland wrote:

On 26 nov. 2014, at 17:48, Adam Wulkiewicz <[hidden email]> wrote:

Henry Roeland wrote:
Dear all,

First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
Technics that we use are C/C++, Sqlite, spatialite and of corse boost.

Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.

Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?
Certainly not the only one. People are asking about it from time to time, using Interprocess' mapped file as a replacement. You're the only one who is willing to help with the implementation. I greately appreciate it.


As I mentioned before I was thinking about a stateful allocator/storage/manager handling file or files/pages, cacheing etc. This allocator would return pointers which when dereferenced would force the allocator to load and cache data. Then, there would be a mechanism to tell the allocator that some pointers/data aren't used anymore. It could be done explicitly by calling some function on allocator like:

persistent_allocator alloc(...);
rtree<..., persistent_allocator> rt(..., alloc);

rt.insert(...);
rt.insert(...);
rt.insert(...);
alloc.flush();

I don’t know about an rtree but from my homework:-) concerning b-trees I understood that pages can be handled as nodes themselves. Pages can be split and contain multiple nodes depending on size. If we could create a stable interface for Page(Nodes) then this interface can become public leaving the Node interface hidden???
It can also have other advantages to alloc a group of nodes (Page) together in terms of memory alignment(Your don’t need a smart pointer for every node only for the page?), Also in terms of (de)serialisation and bulk load handling a group of nodes would IMHO be better (if its possible).

Of course you're right, nodes could (or even should) be grouped in pages, loaded together etc. Also we should probably group close nodes together in one page (e.g. parents with children), allocator hint could be used for this. But I was thinking about leaving the desicion how to store nodes to the allocator/storage. As you said one could use big pages, one could store 1 node per file, or a database etc. And still somehow a page or some other storage or Archive must know what data should be saved/loaded per node. In my scenario this would just be implemented in a Serialization-like way. Higher level aspects like pages would be closed in allocator/storage. Of course assuming that this could be done this way but I feel that it could.

I think that a pointer still would have to store some ID of a node inside a page to identify an exact placement of the data. Or at least some global ID somehow indexed to access a specific node in a specific page.


The next step could be adding of some hooks or traits for allocators which would be called by the rtree internally on the end of some operation, e.g. insert. So this wouldn't be a C++ allocator anymore since it'd have additional methods.

To avoid storing a reference to allocator/storage in those smart pointers the rtree could notify the allocator/storage when a pointer is dereferenced and data is required. So the responsibility for the notification of allocator/storage would be moved from a pointer to the rtree.

The Page Node can administrate its size and its free size and have a smart_pointer telling when it was last used and dependency_checker /unload checker with responsibility for notification. Maybe we can use Boost Signals2 for notification?


Together with a Page/Storage manager I’m thinking about the following points:
- Policy for when to load/unload. traits as you point it out.
- Size of rtree must be known at all times. If some limit is reached start unloading (allocator can tell the size?) to bottom pages that are less used?
- 25%/50%/75% percent of the top of the tree must remain in memory for speed???
- define 3 actions/states for a page: loaded, unloaded, unloaded but cached as memory map.????

So what you're describing here is how a paging allocator/storage could work. Sure if the storage was notified by the rtree about the performed actions it'd be able to do whatever it needs, cache pages, store some kind of usage counter to know which pages are used most often, support various policies, etc.

I was describing more low level stuff that would have to be implemented on the rtree side. Basically 2 things notifications for actions and an interface for serialization of nodes and rtree header. Or would you prefer if I handled it, that you could focus on the implementation of a specific allocator/storage?

There is also another thing that is mentioned in another thread (started by Tim). There should be some way of creation non-rtree, temporary pages, e.g. for data stored in a vector or some other container designed specifically for this purpose. But this could be handled later.


Finally the rtree could notify the allocator/storage each time the data owned by a pointer is no longer needed but this probably wouldn't change anything. Furthermore modifying and releasing the nodes before an operation is finished wouldn't be safe in case when an exception was thrown. In fact, this allocator should keep a copy of nodes in memory during a whole operation and then at the end of the operation save the data into a persistent storage.


The next thing is how to serialize data from and to node. We should ask ourselves should we allow the users to implement other persistent storage variants on their own. If the answer was no, then we could just internally in the persistent allocator use the internals of the rtree. But if we wanted to allow it (which IMO should be done) the most straightforward way would be to expose the interface used by the rtree internally to access node's elements. Exposing it would mean that we couldn't change it in the future, so I'd prefer to avoid it. Instead, we could mimic or directly use Boost.Serialization interface for Nodes. In this scenario a specific Node would know what data must be saved and loaded and it'd be able to save or load to/from some generic Archive. Depending on the needs this could be some temporary 1-node Archive gathering data for later or a target Archive capable to serialize data on the fly, it'd depend on an allocator/storage.

This way we could also support versioning on a node level, the same way how it's done in Serialization. So changes would have to be done on a nodes level not in the user-defined allocator. An example could be an implementation of a rtree variant storing additional data in nodes (hilbert r-tree) or additional types of nodes (PRtree). Also arbitrary user-defined Values would be serialized the same way (using Serialization or familiar Serialization-like interface).

And this way we'd also support Serialization in one go.


We'd probably need some file header with the parameters of an rtree both in persistent storage and in serialization support. Similar as with nodes, an rtree could know how to load/save the header from/to some Archive. The rtree should e.g. check if it's possible to use persistent storage and load data, e.g. if it wasn't created using incompatible parameters, etc.


How is it with backwards compatibility? I can imagine that Storage work can break something already running?

I think that this won't be problematic.

If I’m talking rubbish just tell me I can handle it:-)


All ideas are valuable :)

Thanks for beneath info to getting me started. I will have to read it a few times to understand it all.

If you have any questions just ask. If I handled it on the rtree side and you was working on the allocator/storage, you probably wouldn't even be forced to know all of this internal stuff. Not that I have something against it :)

Regards,
Adam

I have a request that the allocator be the sole source of allocation.  One of my problems with the current implementation is that one of the constructors creates a temporary vector of centroids as a source for a packing algorithm.  My understanding is that the allocator needs to be generic enough to offer up a temporary buffer (something similar to std::get_temporary_buffer) so that the packing algorithm can take advantage of it without going to the heap.

Best,
Tim

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Storage allocator questions

Henry Roeland
Hi Tim,


On 29 nov. 2014, at 20:03, Tim Finer <[hidden email]> wrote:

Hi Henry,

On 11/29/14, 10:48 AM, Henry Roeland wrote:
Hi Adam,

It would be greet if I can focus on storage allocator and leave the rtree part to you or somebody else.

I hope I can still ask you guys questions concerning this storage allocator? I have some global Idea’s and probably need help both on design level and implementation.


Questions so far:

1. Is paging the only mechanism to get memory growth under control for an rtree? Its the only one I can think of, but maybe there are other ways to explore?


2. Is there any way an rtree can be read-only? Or set read-only after (bulk)insert?


3. One problem/challenge is how to get and keep region(box) coordinates “divided” over the page nodes. At least thats my understanding of an (r)tree. You need the region coordinates as fast lookup ID to know if a query should “dive” into the page or its sub pages or not. 
Crazy idea: Maybe the allocator should manage an rtree of pages with references/pointers to the “real” tree nodes?

Questions: 
  • Is my understanding of data lookup right? So coordinates determine the location of the data inside the RTree? 
  • If this is true does this then mean that all Pages (Loaded or unloaded) must be known including its region coordinates? 
  • This in order to get/keep the tree balanced and know which page must be loaded (by coordinate query)?
 

Lets discuss the different actions an storage allocator can/must handle:
  • Load in Page data on request by RTree
    • “Load" can mean anything: From File, From DB, From MessageQueue,…
    • “Request" means query hits coordinate region that matches the Page coordinates
  • Bulk Load multiple pages at once???

  • Unload Page data on request by Storage Manager itself(?)
    • “Unload” can mean: 
      • To /dev/null :-) Only useful when read-only rtree
      • To File: Memory mapped or otherwise
      • To DB: Save data to (spatial) database 
      • To MessageQueue
      • ???
    • “Request” can mean:
      • Smart_pointer(s) say that nodes inside page are not used anymore
      • Smart_pointer(s) say that page is not used anymore
      • Amount of usage(hit count) is small
      • ???
  • Bulk Unload multiple pages at once???

NOTE: The storage allocator must, as Adam already pointed out, deliver an interface to make above points possible. But not implement them in anyway.

Other questions/points that come to my mind are: 
  • Must the storage allocator always store? What about transactions between in memory (RTree) and Persistence storage?
  • State-full allocator: Who owns the nodes and the data inside it? I always thought of an allocator like a factory that does not own the data…
  • When data is owned by another STL(like) container then IMHO a storage allocator has no use. This because the storage allocator then does not own the data and has no means to free/unload it.


Above is going top down by requirements/features but code-wise I will try to handle in my next e-mail. 

Thanks for your time,

Kind regards,

Henry Roeland


On 26 nov. 2014, at 22:12, Adam Wulkiewicz <[hidden email]> wrote:

Hi Henry,

Henry Roeland wrote:

On 26 nov. 2014, at 17:48, Adam Wulkiewicz <[hidden email]> wrote:

Henry Roeland wrote:
Dear all,

First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
Technics that we use are C/C++, Sqlite, spatialite and of corse boost.

Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.

Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?
Certainly not the only one. People are asking about it from time to time, using Interprocess' mapped file as a replacement. You're the only one who is willing to help with the implementation. I greately appreciate it.


As I mentioned before I was thinking about a stateful allocator/storage/manager handling file or files/pages, cacheing etc. This allocator would return pointers which when dereferenced would force the allocator to load and cache data. Then, there would be a mechanism to tell the allocator that some pointers/data aren't used anymore. It could be done explicitly by calling some function on allocator like:

persistent_allocator alloc(...);
rtree<..., persistent_allocator> rt(..., alloc);

rt.insert(...);
rt.insert(...);
rt.insert(...);
alloc.flush();

I don’t know about an rtree but from my homework:-) concerning b-trees I understood that pages can be handled as nodes themselves. Pages can be split and contain multiple nodes depending on size. If we could create a stable interface for Page(Nodes) then this interface can become public leaving the Node interface hidden???
It can also have other advantages to alloc a group of nodes (Page) together in terms of memory alignment(Your don’t need a smart pointer for every node only for the page?), Also in terms of (de)serialisation and bulk load handling a group of nodes would IMHO be better (if its possible).

Of course you're right, nodes could (or even should) be grouped in pages, loaded together etc. Also we should probably group close nodes together in one page (e.g. parents with children), allocator hint could be used for this. But I was thinking about leaving the desicion how to store nodes to the allocator/storage. As you said one could use big pages, one could store 1 node per file, or a database etc. And still somehow a page or some other storage or Archive must know what data should be saved/loaded per node. In my scenario this would just be implemented in a Serialization-like way. Higher level aspects like pages would be closed in allocator/storage. Of course assuming that this could be done this way but I feel that it could.

I think that a pointer still would have to store some ID of a node inside a page to identify an exact placement of the data. Or at least some global ID somehow indexed to access a specific node in a specific page.


The next step could be adding of some hooks or traits for allocators which would be called by the rtree internally on the end of some operation, e.g. insert. So this wouldn't be a C++ allocator anymore since it'd have additional methods.

To avoid storing a reference to allocator/storage in those smart pointers the rtree could notify the allocator/storage when a pointer is dereferenced and data is required. So the responsibility for the notification of allocator/storage would be moved from a pointer to the rtree.

The Page Node can administrate its size and its free size and have a smart_pointer telling when it was last used and dependency_checker /unload checker with responsibility for notification. Maybe we can use Boost Signals2 for notification?


Together with a Page/Storage manager I’m thinking about the following points:
- Policy for when to load/unload. traits as you point it out.
- Size of rtree must be known at all times. If some limit is reached start unloading (allocator can tell the size?) to bottom pages that are less used?
- 25%/50%/75% percent of the top of the tree must remain in memory for speed???
- define 3 actions/states for a page: loaded, unloaded, unloaded but cached as memory map.????

So what you're describing here is how a paging allocator/storage could work. Sure if the storage was notified by the rtree about the performed actions it'd be able to do whatever it needs, cache pages, store some kind of usage counter to know which pages are used most often, support various policies, etc.

I was describing more low level stuff that would have to be implemented on the rtree side. Basically 2 things notifications for actions and an interface for serialization of nodes and rtree header. Or would you prefer if I handled it, that you could focus on the implementation of a specific allocator/storage?

There is also another thing that is mentioned in another thread (started by Tim). There should be some way of creation non-rtree, temporary pages, e.g. for data stored in a vector or some other container designed specifically for this purpose. But this could be handled later.


Finally the rtree could notify the allocator/storage each time the data owned by a pointer is no longer needed but this probably wouldn't change anything. Furthermore modifying and releasing the nodes before an operation is finished wouldn't be safe in case when an exception was thrown. In fact, this allocator should keep a copy of nodes in memory during a whole operation and then at the end of the operation save the data into a persistent storage.


The next thing is how to serialize data from and to node. We should ask ourselves should we allow the users to implement other persistent storage variants on their own. If the answer was no, then we could just internally in the persistent allocator use the internals of the rtree. But if we wanted to allow it (which IMO should be done) the most straightforward way would be to expose the interface used by the rtree internally to access node's elements. Exposing it would mean that we couldn't change it in the future, so I'd prefer to avoid it. Instead, we could mimic or directly use Boost.Serialization interface for Nodes. In this scenario a specific Node would know what data must be saved and loaded and it'd be able to save or load to/from some generic Archive. Depending on the needs this could be some temporary 1-node Archive gathering data for later or a target Archive capable to serialize data on the fly, it'd depend on an allocator/storage.

This way we could also support versioning on a node level, the same way how it's done in Serialization. So changes would have to be done on a nodes level not in the user-defined allocator. An example could be an implementation of a rtree variant storing additional data in nodes (hilbert r-tree) or additional types of nodes (PRtree). Also arbitrary user-defined Values would be serialized the same way (using Serialization or familiar Serialization-like interface).

And this way we'd also support Serialization in one go.


We'd probably need some file header with the parameters of an rtree both in persistent storage and in serialization support. Similar as with nodes, an rtree could know how to load/save the header from/to some Archive. The rtree should e.g. check if it's possible to use persistent storage and load data, e.g. if it wasn't created using incompatible parameters, etc.


How is it with backwards compatibility? I can imagine that Storage work can break something already running?

I think that this won't be problematic.

If I’m talking rubbish just tell me I can handle it:-)


All ideas are valuable :)

Thanks for beneath info to getting me started. I will have to read it a few times to understand it all.

If you have any questions just ask. If I handled it on the rtree side and you was working on the allocator/storage, you probably wouldn't even be forced to know all of this internal stuff. Not that I have something against it :)

Regards,
Adam

I have a request that the allocator be the sole source of allocation.  One of my problems with the current implementation is that one of the constructors creates a temporary vector of centroids as a source for a packing algorithm.  My understanding is that the allocator needs to be generic enough to offer up a temporary buffer (something similar to std::get_temporary_buffer) so that the packing algorithm can take advantage of it without going to the heap.

Best,
Tim
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry

Do you mean that the storage allocator must also support creating temporary buffer(s)? And what about storing this temporary buffers? 

I will take this into account. Thanks for pointing this out.

Kind regards,

Henry


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Storage allocator questions

Adam Wulkiewicz
In reply to this post by Henry Roeland
Hi Henry,

Henry Roeland wrote:
It would be greet if I can focus on storage allocator and leave the rtree part to you or somebody else.


Ok, let's do it this way. FYI, I won't have time this week to work on this. But at least I'll answer on this email briefly.

I hope I can still ask you guys questions concerning this storage allocator? I have some global Idea’s and probably need help both on design level and implementation.


Questions so far:

1. Is paging the only mechanism to get memory growth under control for an rtree? Its the only one I can think of, but maybe there are other ways to explore?


For the first version I think that our design should be as generic as possible:
- the Storage concept should be an abstraction of a persistent storage, just like an Allocator is an abstraction of "directly-accessible" memory.
- the rtree would request something or notify the storage about some events but it wouldn't have to know anything about the internal structure, techniques, heuristics, etc. E.g. the rtree wouldn't know anything about paging.
- the interfaces should probably be as compact as possible, e.g. as I said before, I think that in addition to the Allocator's interface we'd need 2 functions:
  * storage_traits<StorageOrAllocator>::get_pointer(ptr)     - notify the Storage that data pointed by a pointer is needed
  * storage_traits<StorageOrAllocator>::release_pointers() - notify the Storage that pointers aren't used anymore
- the storage probably shouldn't write the data directly to persistent part, instead it should keep the modified data cached and then write them all in one go. It could depend on some heuristic, the Storage could write changes when release_pointers() was called or it could define flush() method which could be called explicitly by the user. Still, the changes of persistent data should be done only if all pointers was released. At all times the storage could choose to keep the data cached. It could choose to unload all of the nodes, or only some less frequently used ones.

Is it reasonable?


2. Is there any way an rtree can be read-only? Or set read-only after (bulk)insert?


It could be done either in run- or in compile-time.
The most obvious way would be to throw a run-time exception in the get_pointer() after some flag (rtree created) was set.
Another possibility could be to disable rtree mutable methods like insert() and remove() in compile-time if some specific parameters or Storage/Allocator was passed.
Do you have some specific use-case in mind?


3. One problem/challenge is how to get and keep region(box) coordinates “divided” over the page nodes. At least thats my understanding of an (r)tree. You need the region coordinates as fast lookup ID to know if a query should “dive” into the page or its sub pages or not. 
Crazy idea: Maybe the allocator should manage an rtree of pages with references/pointers to the “real” tree nodes?

Questions: 
  • Is my understanding of data lookup right? So coordinates determine the location of the data inside the RTree? 
  • If this is true does this then mean that all Pages (Loaded or unloaded) must be known including its region coordinates? 
  • This in order to get/keep the tree balanced and know which page must be loaded (by coordinate query)?
 

Each node internal node (in a classic R-tree) is a container of (Box, Ptr) pairs. During any traversal (insert, query, etc.) the rtree probably should load the whole node to have access to all of the Boxes of children regions. AFAIR all traversing algorithms access all Boxes. Then it chooses which children should be traversed and it traverses into the next level using the corresponding pointers.


Lets discuss the different actions an storage allocator can/must handle:
  • Load in Page data on request by RTree
    • “Load" can mean anything: From File, From DB, From MessageQueue,…
So the load would potentially be done in the call to get_pointer(). AFAIU the Storage would have to check if the underlying data (identified by Ptr storing some ID or handle) isn't already loaded/cached. If it was it'd just return the Ptr, if not then it'd:
- load a file,
- (maybe somewhere cache the raw file),
- create a node from loaded data and keep it,
- return the pointer.

So in my scenario (release_pointers()) it'd have to keep some container of returned pointers and when release_pointers(), flush() was called then just iterate over them and save changed nodes or create new pages, etc. It should probably also be possible to search the specific pointer in this container, so it should probably be map or unordered_map, or sorted vector, etc. Or do you have other ideas?

E.g. when construct() was called the Storage probably shouldn't create a Page or other persistent data, it should be created later, e.g. on flush(). The same with destroy().

Btw, the Storage/Allocator would have to support rebinding and creation of various types, not only nodes. It'd have to know if a type is a node or some other type. So some types would be handled differently than other types. I'm not sure if this is a good or a bad thing. I guess it's possible even with std::allocator<> to specialize it for some specific type.

    • “Request" means query hits coordinate region that matches the Page coordinates
This isn't needed, the rtree would know exactly which child node must be accessed becaused the Ptr is bound with the region in the internal node.
  • Bulk Load multiple pages at once???

Hmm, it could be done and potentially usable. Though it'd complicate the storage_traits (1 additional methog/overload). But do you think that we could gain anything from that? I mean, various pages would have to be loaded and cached anyway. This could be added later if needed.
  • Unload Page data on request by Storage Manager itself(?)
    • “Unload” can mean: 
      • To /dev/null :-) Only useful when read-only rtree
      • To File: Memory mapped or otherwise
      • To DB: Save data to (spatial) database 
      • To MessageQueue
      • ???
    • “Request” can mean:
      • Smart_pointer(s) say that nodes inside page are not used anymore
      • Smart_pointer(s) say that page is not used anymore
      • Amount of usage(hit count) is small
      • ???
  • Bulk Unload multiple pages at once???


So in my scenario the rtree would notify the Storage that all pointers are released and that the Storage may do whatever it like with them. So it could save changed nodes, unload some of them, etc.

Hmm, the Storage also probably should know that a node was modified, or that pointers are released after mutable operation like insert() or remove(). So e.g. in the first version after an insert() the Storage could just save all nodes.

NOTE: The storage allocator must, as Adam already pointed out, deliver an interface to make above points possible. But not implement them in anyway.


So as mentioned above I was thinking about storage_traits which could be specialized for some StorageOrAllocator. Some member functions would be empty for Allocators but for Storage would have to be implemented. They would probably call some Storage's methods which could be called whatever the implementor liked.

Other questions/points that come to my mind are: 
  • Must the storage allocator always store? What about transactions between in memory (RTree) and Persistence storage?
  • State-full allocator: Who owns the nodes and the data inside it? I always thought of an allocator like a factory that does not own the data…
  • When data is owned by another STL(like) container then IMHO a storage allocator has no use. This because the storage allocator then does not own the data and has no means to free/unload it.


I see 2 ways how the ownership could be implemented:
1. Similar to Boost.Interprocess where there is a memory Manager owning the data and AFAIU an Allocator only wrapping some reference to this Manager. So AFAIU (I didn't check the code) the Allocator is only an interface between a Container and a Manager. This simplifies the copying of an Allocator, rebinding, etc.
2. An Allocator which is in the same time a Manager or rather owns a Manager, so also owns the data. This'd probably require that the Allocator would have to store a shared_pointer to some Manager internally. An instance of a Manager would be created under the hood and a pointer to it would be copied automatically.

1 is more clear but it'd require to design not only the interface for a Storage/Allocator but also for a Manager, like in the case of Interprocess. 2 would only require to design the interface of an Allocator/Storage. If we choose 1 we could divide the logic into 2 Concepts, the Allocator could implement the in-memory-part (pointers, memory allocation, construction, destruction, etc.) and the Manager the persistent part (some file IDs or handles, saving and loading files, nodes serialization, etc.). So as in the case of Interprocess we could have 1 Allocator and could have many Managers.

In general I propose to work iteratively, that is at the beginning support minimal set of required operations in the Storage/Allocator to be able to e.g. insert() values and perform a query() in the most simple, probably inefficient way, but with a simple and elegant interface. And then optimize operations and extend the interface. Or would you prefer to design the whole Storage/Allocator theoreticaly?

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Storage allocator questions

Henry Roeland
Hi Adam,

On 1 dec. 2014, at 17:20, Adam Wulkiewicz <[hidden email]> wrote:

Hi Henry,

Henry Roeland wrote:
It would be greet if I can focus on storage allocator and leave the rtree part to you or somebody else.


Ok, let's do it this way. FYI, I won't have time this week to work on this. But at least I'll answer on this email briefly.


No problem.

I hope I can still ask you guys questions concerning this storage allocator? I have some global Idea’s and probably need help both on design level and implementation.


Questions so far:

1. Is paging the only mechanism to get memory growth under control for an rtree? Its the only one I can think of, but maybe there are other ways to explore?


For the first version I think that our design should be as generic as possible:

One vote for generic! :-)

- the Storage concept should be an abstraction of a persistent storage, just like an Allocator is an abstraction of "directly-accessible" memory.
- the rtree would request something or notify the storage about some events but it wouldn't have to know anything about the internal structure, techniques, heuristics, etc. E.g. the rtree wouldn't know anything about paging.
OK. But what about the other way around? Should the Storage/Allocator not know how the container is build up? This in order to choose efficiently how to store its data?

- the interfaces should probably be as compact as possible, e.g. as I said before, I think that in addition to the Allocator's interface we'd need 2 functions:
  * storage_traits<StorageOrAllocator>::get_pointer(ptr)     - notify the Storage that data pointed by a pointer is needed
  * storage_traits<StorageOrAllocator>::release_pointers() - notify the Storage that pointers aren't used anymore
- the storage probably shouldn't write the data directly to persistent part, instead it should keep the modified data cached and then write them all in one go. It could depend on some heuristic, the Storage could write changes when release_pointers() was called or it could define flush() method which could be called explicitly by the user. Still, the changes of persistent data should be done only if all pointers was released. At all times the storage could choose to keep the data cached. It could choose to unload all of the nodes, or only some less frequently used ones.

Is it reasonable?


Yes it sounds reasonable. But are we not creating a generic storage allocator which also can be used on e.g. std::vector or other stl containers? I have no problem with this but I cannot imagine that we are the first…

Other questions/points that popup:
  • What about relative pointers e.g. node vs. storage/page?
  • What about memory management issues:
    • Fragmentation
    • Alloc ahead
    • Garbage collection (We are going to do it already using smart pointer(s) but on low level?)
      • malloc libs like jemalloc and tcmalloc can do this on low level…
    • Move pages/blocks: Pointers become invalid?
    • Object/Page/Node pooling???
    • Does STL have any guidelines restrictions on memory management?
  • Is rtree thread safe? Should our Storage/Allocator be thread-safe?




2. Is there any way an rtree can be read-only? Or set read-only after (bulk)insert?


It could be done either in run- or in compile-time.
The most obvious way would be to throw a run-time exception in the get_pointer() after some flag (rtree created) was set.
Another possibility could be to disable rtree mutable methods like insert() and remove() in compile-time if some specific parameters or Storage/Allocator was passed.
Do you have some specific use-case in mind?

At the company I’m working for :-) They use readonly maps a lot. So in the future(hopefully)rtree with storage/allocator as cache.




3. One problem/challenge is how to get and keep region(box) coordinates “divided” over the page nodes. At least thats my understanding of an (r)tree. You need the region coordinates as fast lookup ID to know if a query should “dive” into the page or its sub pages or not. 
Crazy idea: Maybe the allocator should manage an rtree of pages with references/pointers to the “real” tree nodes?

Questions: 
  • Is my understanding of data lookup right? So coordinates determine the location of the data inside the RTree? 
  • If this is true does this then mean that all Pages (Loaded or unloaded) must be known including its region coordinates? 
  • This in order to get/keep the tree balanced and know which page must be loaded (by coordinate query)?
 

Each node internal node (in a classic R-tree) is a container of (Box, Ptr) pairs. During any traversal (insert, query, etc.) the rtree probably should load the whole node to have access to all of the Boxes of children regions. AFAIR all traversing algorithms access all Boxes. Then it chooses which children should be traversed and it traverses into the next level using the corresponding pointers.


Lets discuss the different actions an storage allocator can/must handle:
  • Load in Page data on request by RTree
    • “Load" can mean anything: From File, From DB, From MessageQueue,…
So the load would potentially be done in the call to get_pointer(). AFAIU the Storage would have to check if the underlying data (identified by Ptr storing some ID or handle) isn't already loaded/cached. If it was it'd just return the Ptr, if not then it'd:
- load a file,
- (maybe somewhere cache the raw file),
- create a node from loaded data and keep it,
- return the pointer.

So in my scenario (release_pointers()) it'd have to keep some container of returned pointers and when release_pointers(), flush() was called then just iterate over them and save changed nodes or create new pages, etc. It should probably also be possible to search the specific pointer in this container, so it should probably be map or unordered_map, or sorted vector, etc. Or do you have other ideas?

E.g. when construct() was called the Storage probably shouldn't create a Page or other persistent data, it should be created later, e.g. on flush(). The same with destroy().

This means lots of housekeeping. Kind of transaction…
At this time I have no other idea’s.

Btw, the Storage/Allocator would have to support rebinding and creation of various types, not only nodes. It'd have to know if a type is a node or some other type. So some types would be handled differently than other types. I'm not sure if this is a good or a bad thing. I guess it's possible even with std::allocator<> to specialize it for some specific type.

    • “Request" means query hits coordinate region that matches the Page coordinates
This isn't needed, the rtree would know exactly which child node must be accessed becaused the Ptr is bound with the region in the internal node.
  • Bulk Load multiple pages at once???

Hmm, it could be done and potentially usable. Though it'd complicate the storage_traits (1 additional methog/overload). But do you think that we could gain anything from that? I mean, various pages would have to be loaded and cached anyway. This could be added later if needed.
  • Unload Page data on request by Storage Manager itself(?)
    • “Unload” can mean: 
      • To /dev/null :-) Only useful when read-only rtree
      • To File: Memory mapped or otherwise
      • To DB: Save data to (spatial) database 
      • To MessageQueue
      • ???
    • “Request” can mean:
      • Smart_pointer(s) say that nodes inside page are not used anymore
      • Smart_pointer(s) say that page is not used anymore
      • Amount of usage(hit count) is small
      • ???
  • Bulk Unload multiple pages at once???


So in my scenario the rtree would notify the Storage that all pointers are released and that the Storage may do whatever it like with them. So it could save changed nodes, unload some of them, etc.

Hmm, the Storage also probably should know that a node was modified, or that pointers are released after mutable operation like insert() or remove(). So e.g. in the first version after an insert() the Storage could just save all nodes.

NOTE: The storage allocator must, as Adam already pointed out, deliver an interface to make above points possible. But not implement them in anyway.


So as mentioned above I was thinking about storage_traits which could be specialized for some StorageOrAllocator. Some member functions would be empty for Allocators but for Storage would have to be implemented. They would probably call some Storage's methods which could be called whatever the implementor liked.

Other questions/points that come to my mind are: 
  • Must the storage allocator always store? What about transactions between in memory (RTree) and Persistence storage?
  • State-full allocator: Who owns the nodes and the data inside it? I always thought of an allocator like a factory that does not own the data…
  • When data is owned by another STL(like) container then IMHO a storage allocator has no use. This because the storage allocator then does not own the data and has no means to free/unload it.


I see 2 ways how the ownership could be implemented:
1. Similar to Boost.Interprocess where there is a memory Manager owning the data and AFAIU an Allocator only wrapping some reference to this Manager. So AFAIU (I didn't check the code) the Allocator is only an interface between a Container and a Manager. This simplifies the copying of an Allocator, rebinding, etc.
2. An Allocator which is in the same time a Manager or rather owns a Manager, so also owns the data. This'd probably require that the Allocator would have to store a shared_pointer to some Manager internally. An instance of a Manager would be created under the hood and a pointer to it would be copied automatically.

1 is more clear but it'd require to design not only the interface for a Storage/Allocator but also for a Manager, like in the case of Interprocess. 2 would only require to design the interface of an Allocator/Storage. If we choose 1 we could divide the logic into 2 Concepts, the Allocator could implement the in-memory-part (pointers, memory allocation, construction, destruction, etc.) and the Manager the persistent part (some file IDs or handles, saving and loading files, nodes serialization, etc.). So as in the case of Interprocess we could have 1 Allocator and could have many Managers.

In general I propose to work iteratively, that is at the beginning support minimal set of required operations in the Storage/Allocator to be able to e.g. insert() values and perform a query() in the most simple, probably inefficient way, but with a simple and elegant interface. And then optimize operations and extend the interface. Or would you prefer to design the whole Storage/Allocator theoreticaly?

I prefer both :-) I will try to keep an UML overview with our suggestions to keep an theoretical overview.  Next to that its probably already time to dive into code to see what is usable and what not.

Pratical question: Should I fork BoostGeometry on github or do you prefer a branch?


Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry


Thanks and Kind regards,

Henry





_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Storage allocator questions

Adam Wulkiewicz
Hi Henry,

Henry Roeland wrote:
On 1 dec. 2014, at 17:20, Adam Wulkiewicz <[hidden email]> wrote:
Henry Roeland wrote:
It would be greet if I can focus on storage allocator and leave the rtree part to you or somebody else.


Ok, let's do it this way. FYI, I won't have time this week to work on this. But at least I'll answer on this email briefly.


No problem.
Ok, we could get back to this if you wanted.


I hope I can still ask you guys questions concerning this storage allocator? I have some global Idea’s and probably need help both on design level and implementation.


Questions so far:

1. Is paging the only mechanism to get memory growth under control for an rtree? Its the only one I can think of, but maybe there are other ways to explore?


For the first version I think that our design should be as generic as possible:

One vote for generic! :-)

- the Storage concept should be an abstraction of a persistent storage, just like an Allocator is an abstraction of "directly-accessible" memory.
- the rtree would request something or notify the storage about some events but it wouldn't have to know anything about the internal structure, techniques, heuristics, etc. E.g. the rtree wouldn't know anything about paging.
OK. But what about the other way around? Should the Storage/Allocator not know how the container is build up? This in order to choose efficiently how to store its data?

What things should be known by the allocator/storage? I'm guessing that if at some point some additional info was needed then it would be easy to add another function or definition to the storage_traits, so the storage would know everything it needs.


- the interfaces should probably be as compact as possible, e.g. as I said before, I think that in addition to the Allocator's interface we'd need 2 functions:
  * storage_traits<StorageOrAllocator>::get_pointer(ptr)     - notify the Storage that data pointed by a pointer is needed
  * storage_traits<StorageOrAllocator>::release_pointers() - notify the Storage that pointers aren't used anymore
- the storage probably shouldn't write the data directly to persistent part, instead it should keep the modified data cached and then write them all in one go. It could depend on some heuristic, the Storage could write changes when release_pointers() was called or it could define flush() method which could be called explicitly by the user. Still, the changes of persistent data should be done only if all pointers was released. At all times the storage could choose to keep the data cached. It could choose to unload all of the nodes, or only some less frequently used ones.

Is it reasonable?


Yes it sounds reasonable. But are we not creating a generic storage allocator which also can be used on e.g. std::vector or other stl containers? I have no problem with this but I cannot imagine that we are the first…


To be honest, I haven't done an extensive research regarding the existing solutions. However every solutions I saw are tightly bound with the container itself. I think that in most cases the container that is implemented is either in-memory or persistent. But the fact that I haven't seen a persistent storage abstraction doesn't mean that it doesn't exist. Though I'm not sure if it would be reasonable in all cases. So sure, additional research could be done.

For the rtree we only need a storage capable to allocate the memory for nodes. And we also need a way to create a big temporary storage, probably divided into several files.

Though it would probably be possible to implement a persistent allocator for std::vector it wouldn't be very efficient because a vector reallocates the memory by creating another block and copying all of the elements. Handling this problem could allow us to implement a heap-like kd-tree storing the nodes in a contigeous storage. But then we'd be forced to deal with additional problems like cacheing chunks of this big persistent memory block, prefetching, etc.

Other questions/points that popup:
  • What about relative pointers e.g. node vs. storage/page?

The pointer stored in the node in memory should uniquely define the location of the data. It could store an ID of a file and an ID of a node stored in that file or whatever. E.g. Boost.Interprocess uses offset pointers containing an offset from the beginning of a memory block owned by the allocator or memory manager, because next time the memory block could be placed in memory at a different place.
  • What about memory management issues:
    • Fragmentation
It probably depends on the allocator/storge itself. I'm guessing that in some cases it wouldn't be a problem, at least with the rtree. If one file or page or whatever you'd like to call it contained only one node then an allocator/storage would have to just find a unique name for this file. So the fragmentation problem would be moved from the program to the OS/Filesystem.
    • Alloc ahead
Do you have in mind a mechanism to create more persistent memory than is needed now for the future? This could be used e.g. in the packing algorithm. I'm guessing that it wouldn't be problematic to add this in the future.
    • Garbage collection (We are going to do it already using smart pointer(s) but on low level?)
      • malloc libs like jemalloc and tcmalloc can do this on low level…
GC for what purpose? The rtree explicitly notifies an allocator if it wants to distroy a node. So it would also notify a storage. So the storage could store the request for node deletion/page removal and when flush() was called perform the action together with other actions, preferably in a transactional manner for safety.
    • Move pages/blocks: Pointers become invalid?
Pointers shouldn't be invalidated. An ID of a node would be stored in parent node and it could be stored in memory in pointers contained in the rtree or in the cache of the storage, etc. So I'm guessing that you're asking about the in-memory part. If the allocator/storage wanted to move pages then it could do it but e.g. in a way preserving the IDs. Or would you like to also modify IDs in the pages containing parents?
    • Object/Page/Node pooling???
What do you mean?
    • Does STL have any guidelines restrictions on memory management?
The STL uses the Allocator concept for this, in C++11 stateful allocators and std::allocator_traits<>. Temporary buffers are created with std::get_temporary_buffer() or new (nothrow).
But this wouldn't be enough for us since some of the objects (nodes) would have to be somehow serialized into disk. So I thought about extending it with
- storage traits (functional) - for notification of the storage/allocator about some additional actions
- storage traits (definitions) - e.g. for defining what must be serialized (nodes), and what shouldn't
- serialization interface - implementing how nodes should be saved and loaded using specific storage

Storage traits could be implemented as one struct as in the case of std::allocator_traits<> or divided into separate structs as in the case of Boost.Geometry traits (tag<>, dimension<>, etc.).

  • Is rtree thread safe? Should our Storage/Allocator be thread-safe?

The rtree is thread-safe more or less the same way as STL containers are. Mutable actions aren't thread-safe (e.g. creation, insert, remove), non-mutable should be (e.g. query).

So it should probably be possible to read data safely. If a storage/allocator was fully thread-safe it could allow us in the future to e.g. parallelize the persistent rtree creation. But let's keep it simple for now :)



2. Is there any way an rtree can be read-only? Or set read-only after (bulk)insert?


It could be done either in run- or in compile-time.
The most obvious way would be to throw a run-time exception in the get_pointer() after some flag (rtree created) was set.
Another possibility could be to disable rtree mutable methods like insert() and remove() in compile-time if some specific parameters or Storage/Allocator was passed.
Do you have some specific use-case in mind?

At the company I’m working for :-) They use readonly maps a lot. So in the future(hopefully)rtree with storage/allocator as cache.


Ok, as I said, it could be done. This mechanism could be implemented in the allocator itself, or the behavior could be defined in compile-time in storage traits, etc.

<snip>



Other questions/points that come to my mind are: 
  • Must the storage allocator always store? What about transactions between in memory (RTree) and Persistence storage?
  • State-full allocator: Who owns the nodes and the data inside it? I always thought of an allocator like a factory that does not own the data…
  • When data is owned by another STL(like) container then IMHO a storage allocator has no use. This because the storage allocator then does not own the data and has no means to free/unload it.


I see 2 ways how the ownership could be implemented:
1. Similar to Boost.Interprocess where there is a memory Manager owning the data and AFAIU an Allocator only wrapping some reference to this Manager. So AFAIU (I didn't check the code) the Allocator is only an interface between a Container and a Manager. This simplifies the copying of an Allocator, rebinding, etc.
2. An Allocator which is in the same time a Manager or rather owns a Manager, so also owns the data. This'd probably require that the Allocator would have to store a shared_pointer to some Manager internally. An instance of a Manager would be created under the hood and a pointer to it would be copied automatically.

1 is more clear but it'd require to design not only the interface for a Storage/Allocator but also for a Manager, like in the case of Interprocess. 2 would only require to design the interface of an Allocator/Storage. If we choose 1 we could divide the logic into 2 Concepts, the Allocator could implement the in-memory-part (pointers, memory allocation, construction, destruction, etc.) and the Manager the persistent part (some file IDs or handles, saving and loading files, nodes serialization, etc.). So as in the case of Interprocess we could have 1 Allocator and could have many Managers.

In general I propose to work iteratively, that is at the beginning support minimal set of required operations in the Storage/Allocator to be able to e.g. insert() values and perform a query() in the most simple, probably inefficient way, but with a simple and elegant interface. And then optimize operations and extend the interface. Or would you prefer to design the whole Storage/Allocator theoreticaly?

I prefer both :-) I will try to keep an UML overview with our suggestions to keep an theoretical overview.  Next to that its probably already time to dive into code to see what is usable and what not.

Pratical question: Should I fork BoostGeometry on github or do you prefer a branch?

You should create your fork of Boost.Geometry on GitHub and create a branch originating in develop in your fork.
Here is a tutorial:
https://github.com/boostorg/geometry/wiki/Contribution-Tutorial

At Boost we're using GitFlow branching model so your branch should originate in develop and be called feature/xxx, e.g. feature/persistency or feature/persistent_allocator, or something like that. So you'd be able to add your code in this branch and if something on the rtree side was needed I'd add it to develop (or you could create a PullRequest) and you'd be able to synchronize your branch (you could just add it in your branch). At the end you'd prepare a PullRequest against the develop branch. Then we could review your code and merge it. Or you could create a few smaller PRs in the process.

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Storage allocator questions

Henry Roeland
Hi Adam,

Followed your suggestions and created a ‘feature/persistency’ branch in a forked geometry github repository: 

Added you (awulkiew) as collaborator to this repository.

I try to make some time this week starting up.

NOTE: Tried the boost geometry tests and got lots of gcc 4.8 compiler crashes. Is this a known? Or should I create an issue?

Kind regards,

Henry Roeland

On 8 dec. 2014, at 16:12, Adam Wulkiewicz <[hidden email]> wrote:

Hi Henry,

Henry Roeland wrote:
On 1 dec. 2014, at 17:20, Adam Wulkiewicz <[hidden email]> wrote:
Henry Roeland wrote:
It would be greet if I can focus on storage allocator and leave the rtree part to you or somebody else.


Ok, let's do it this way. FYI, I won't have time this week to work on this. But at least I'll answer on this email briefly.


No problem.
Ok, we could get back to this if you wanted.


I hope I can still ask you guys questions concerning this storage allocator? I have some global Idea’s and probably need help both on design level and implementation.


Questions so far:

1. Is paging the only mechanism to get memory growth under control for an rtree? Its the only one I can think of, but maybe there are other ways to explore?


For the first version I think that our design should be as generic as possible:

One vote for generic! :-)

- the Storage concept should be an abstraction of a persistent storage, just like an Allocator is an abstraction of "directly-accessible" memory.
- the rtree would request something or notify the storage about some events but it wouldn't have to know anything about the internal structure, techniques, heuristics, etc. E.g. the rtree wouldn't know anything about paging.
OK. But what about the other way around? Should the Storage/Allocator not know how the container is build up? This in order to choose efficiently how to store its data?

What things should be known by the allocator/storage? I'm guessing that if at some point some additional info was needed then it would be easy to add another function or definition to the storage_traits, so the storage would know everything it needs.


- the interfaces should probably be as compact as possible, e.g. as I said before, I think that in addition to the Allocator's interface we'd need 2 functions:
  * storage_traits<StorageOrAllocator>::get_pointer(ptr)     - notify the Storage that data pointed by a pointer is needed
  * storage_traits<StorageOrAllocator>::release_pointers() - notify the Storage that pointers aren't used anymore
- the storage probably shouldn't write the data directly to persistent part, instead it should keep the modified data cached and then write them all in one go. It could depend on some heuristic, the Storage could write changes when release_pointers() was called or it could define flush() method which could be called explicitly by the user. Still, the changes of persistent data should be done only if all pointers was released. At all times the storage could choose to keep the data cached. It could choose to unload all of the nodes, or only some less frequently used ones.

Is it reasonable?


Yes it sounds reasonable. But are we not creating a generic storage allocator which also can be used on e.g. std::vector or other stl containers? I have no problem with this but I cannot imagine that we are the first…


To be honest, I haven't done an extensive research regarding the existing solutions. However every solutions I saw are tightly bound with the container itself. I think that in most cases the container that is implemented is either in-memory or persistent. But the fact that I haven't seen a persistent storage abstraction doesn't mean that it doesn't exist. Though I'm not sure if it would be reasonable in all cases. So sure, additional research could be done.

For the rtree we only need a storage capable to allocate the memory for nodes. And we also need a way to create a big temporary storage, probably divided into several files.

Though it would probably be possible to implement a persistent allocator for std::vector it wouldn't be very efficient because a vector reallocates the memory by creating another block and copying all of the elements. Handling this problem could allow us to implement a heap-like kd-tree storing the nodes in a contigeous storage. But then we'd be forced to deal with additional problems like cacheing chunks of this big persistent memory block, prefetching, etc.

Other questions/points that popup:
  • What about relative pointers e.g. node vs. storage/page?

The pointer stored in the node in memory should uniquely define the location of the data. It could store an ID of a file and an ID of a node stored in that file or whatever. E.g. Boost.Interprocess uses offset pointers containing an offset from the beginning of a memory block owned by the allocator or memory manager, because next time the memory block could be placed in memory at a different place.
  • What about memory management issues:
    • Fragmentation
It probably depends on the allocator/storge itself. I'm guessing that in some cases it wouldn't be a problem, at least with the rtree. If one file or page or whatever you'd like to call it contained only one node then an allocator/storage would have to just find a unique name for this file. So the fragmentation problem would be moved from the program to the OS/Filesystem.
    • Alloc ahead
Do you have in mind a mechanism to create more persistent memory than is needed now for the future? This could be used e.g. in the packing algorithm. I'm guessing that it wouldn't be problematic to add this in the future.
    • Garbage collection (We are going to do it already using smart pointer(s) but on low level?)
      • malloc libs like jemalloc and tcmalloc can do this on low level…
GC for what purpose? The rtree explicitly notifies an allocator if it wants to distroy a node. So it would also notify a storage. So the storage could store the request for node deletion/page removal and when flush() was called perform the action together with other actions, preferably in a transactional manner for safety.
    • Move pages/blocks: Pointers become invalid?
Pointers shouldn't be invalidated. An ID of a node would be stored in parent node and it could be stored in memory in pointers contained in the rtree or in the cache of the storage, etc. So I'm guessing that you're asking about the in-memory part. If the allocator/storage wanted to move pages then it could do it but e.g. in a way preserving the IDs. Or would you like to also modify IDs in the pages containing parents?
    • Object/Page/Node pooling???
What do you mean?
    • Does STL have any guidelines restrictions on memory management?
The STL uses the Allocator concept for this, in C++11 stateful allocators and std::allocator_traits<>. Temporary buffers are created with std::get_temporary_buffer() or new (nothrow).
But this wouldn't be enough for us since some of the objects (nodes) would have to be somehow serialized into disk. So I thought about extending it with
- storage traits (functional) - for notification of the storage/allocator about some additional actions
- storage traits (definitions) - e.g. for defining what must be serialized (nodes), and what shouldn't
- serialization interface - implementing how nodes should be saved and loaded using specific storage

Storage traits could be implemented as one struct as in the case of std::allocator_traits<> or divided into separate structs as in the case of Boost.Geometry traits (tag<>, dimension<>, etc.).

  • Is rtree thread safe? Should our Storage/Allocator be thread-safe?

The rtree is thread-safe more or less the same way as STL containers are. Mutable actions aren't thread-safe (e.g. creation, insert, remove), non-mutable should be (e.g. query).

So it should probably be possible to read data safely. If a storage/allocator was fully thread-safe it could allow us in the future to e.g. parallelize the persistent rtree creation. But let's keep it simple for now :)



2. Is there any way an rtree can be read-only? Or set read-only after (bulk)insert?


It could be done either in run- or in compile-time.
The most obvious way would be to throw a run-time exception in the get_pointer() after some flag (rtree created) was set.
Another possibility could be to disable rtree mutable methods like insert() and remove() in compile-time if some specific parameters or Storage/Allocator was passed.
Do you have some specific use-case in mind?

At the company I’m working for :-) They use readonly maps a lot. So in the future(hopefully)rtree with storage/allocator as cache.


Ok, as I said, it could be done. This mechanism could be implemented in the allocator itself, or the behavior could be defined in compile-time in storage traits, etc.

<snip>



Other questions/points that come to my mind are: 
  • Must the storage allocator always store? What about transactions between in memory (RTree) and Persistence storage?
  • State-full allocator: Who owns the nodes and the data inside it? I always thought of an allocator like a factory that does not own the data…
  • When data is owned by another STL(like) container then IMHO a storage allocator has no use. This because the storage allocator then does not own the data and has no means to free/unload it.


I see 2 ways how the ownership could be implemented:
1. Similar to Boost.Interprocess where there is a memory Manager owning the data and AFAIU an Allocator only wrapping some reference to this Manager. So AFAIU (I didn't check the code) the Allocator is only an interface between a Container and a Manager. This simplifies the copying of an Allocator, rebinding, etc.
2. An Allocator which is in the same time a Manager or rather owns a Manager, so also owns the data. This'd probably require that the Allocator would have to store a shared_pointer to some Manager internally. An instance of a Manager would be created under the hood and a pointer to it would be copied automatically.

1 is more clear but it'd require to design not only the interface for a Storage/Allocator but also for a Manager, like in the case of Interprocess. 2 would only require to design the interface of an Allocator/Storage. If we choose 1 we could divide the logic into 2 Concepts, the Allocator could implement the in-memory-part (pointers, memory allocation, construction, destruction, etc.) and the Manager the persistent part (some file IDs or handles, saving and loading files, nodes serialization, etc.). So as in the case of Interprocess we could have 1 Allocator and could have many Managers.

In general I propose to work iteratively, that is at the beginning support minimal set of required operations in the Storage/Allocator to be able to e.g. insert() values and perform a query() in the most simple, probably inefficient way, but with a simple and elegant interface. And then optimize operations and extend the interface. Or would you prefer to design the whole Storage/Allocator theoreticaly?

I prefer both :-) I will try to keep an UML overview with our suggestions to keep an theoretical overview.  Next to that its probably already time to dive into code to see what is usable and what not.

Pratical question: Should I fork BoostGeometry on github or do you prefer a branch?

You should create your fork of Boost.Geometry on GitHub and create a branch originating in develop in your fork.
Here is a tutorial:
https://github.com/boostorg/geometry/wiki/Contribution-Tutorial

At Boost we're using GitFlow branching model so your branch should originate in develop and be called feature/xxx, e.g. feature/persistency or feature/persistent_allocator, or something like that. So you'd be able to add your code in this branch and if something on the rtree side was needed I'd add it to develop (or you could create a PullRequest) and you'd be able to synchronize your branch (you could just add it in your branch). At the end you'd prepare a PullRequest against the develop branch. Then we could review your code and merge it. Or you could create a few smaller PRs in the process.

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests (was: Storage allocator questions)

Adam Wulkiewicz
Hi Henry,

Henry Roeland wrote:
Hi Adam,

Followed your suggestions and created a ‘feature/persistency’ branch in a forked geometry github repository: 

Added you (awulkiew) as collaborator to this repository.


Hmm, I think there is no need for that since I might fork it and create a PR if needed, but ok, we'll see, maybe it'd be useful.

I try to make some time this week starting up.

NOTE: Tried the boost geometry tests and got lots of gcc 4.8 compiler crashes. Is this a known? Or should I create an issue?


It should compile properly. What do you mean by compiler crashes?

Here everything looks right here (besides MSVC tests broken by Boost.Variant):
http://www.boost.org/development/tests/develop/developer/geometry.html

How do the failures look like? Is the problem located in Boost.Geometry?
If the problem was located in Variant, then you could checkout some older commit, e.g.:

cd libs/variant
git checkout 6db941f3

And run the Geometry tests again.

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests (was: Storage allocator questions)

Henry Roeland
Hi Adam,

On 8 dec. 2014, at 22:26, Adam Wulkiewicz <[hidden email]> wrote:

git checkout 6db941f3

This checkout of boostorg/variant github:

Variant report of ‘b2 test':
...failed updating 1 target…    <— after adding symbolic link to include this failure disapeared
...skipped 3 targets...
...updated 141 targets...


Geometry report of ‘b2 test’ (With above variant checkout):

One of the errors:
...
gcc.compile.c++ ../../bin.v2/libs/geometry/test/algorithms/append.test/gcc-4.8/debug/append.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

    "g++"  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"../.." -I"../../boost/geometry/extensions/contrib/ttmath" -I"test" -c -o "../../bin.v2/libs/geometry/test/algorithms/append.test/gcc-4.8/debug/append.o" "test/algorithms/append.cpp"

...failed gcc.compile.c++ ../../bin.v2/libs/geometry/test/algorithms/append.test/gcc-4.8/debug/append.o...
...skipped <p../../bin.v2/libs/geometry/test/algorithms/append.test/gcc-4.8/debug>append for lack of <p../../bin.v2/libs/geometry/test/algorithms/append.test/gcc-4.8/debug>append.o...
...skipped <p../../bin.v2/libs/geometry/test/algorithms/append.test/gcc-4.8/debug>append.run for lack of <p../../bin.v2/libs/geometry/test/algorithms/append.test/gcc-4.8/debug>append...
...


System & Compiler info:

henry@ubuntu:~$ uname -a
Linux ubuntu 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

henry@ubuntu:~$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.2-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)



How do I test?:
Used source of boost 1.57.0 and added:
suggested checkout of boost variant with symbolic link from libs/variant/include/boost/variant* to boost/variant*
checkout of boost geometry develop (feature) branch with symbolic link from libs/geometry/include/boost/geometry* to boost/geometry*
NOTE: Renamed both original variant* and geometry* files and dirs and added checkouts

run ‘b2 test’ in both libs/variant and libs/geometry

b2 is from the 1.57.0 build itself and added to PATH.

kind regards,

Henry Roeland

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests (was: Storage allocator questions)

Adam Wulkiewicz
Hi Henry,

2014-12-09 2:00 GMT+01:00 Henry Roeland <[hidden email]>:

How do I test?:
Used source of boost 1.57.0 and added:
suggested checkout of boost variant with symbolic link from libs/variant/include/boost/variant* to boost/variant*
checkout of boost geometry develop (feature) branch with symbolic link from libs/geometry/include/boost/geometry* to boost/geometry*
NOTE: Renamed both original variant* and geometry* files and dirs and added checkouts


The errors you pasted aren't related to Variant.
You shouldn't rename anything or create links manually.
Have you done the steps mentioned here: https://svn.boost.org/trac/boost/wiki/TryModBoost ?

In particular:
git clone --recursive [hidden email]:boostorg/boost.git modular-boost
cd modular-boost
.\bootstrap
.\b2 headers
So cloned Boost with --recursive flag and run b2 headers. Then run the tests, e.g.:

b2 libs/geometry/test

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests (was: Storage allocator questions)

Henry Roeland
Hi Adam,

Sorry missed the Modular part.

After checkout of modular boost tree and setting libs/variant back to old(er) commit I still get errors:

gcc.compile.c++ ../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

    "g++"  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o" "geometries/boost_polygon.cpp"

...failed gcc.compile.c++ ../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o...
...skipped <p../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug>boost_polygon for lack of <p../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug>boost_polygon.o...
...skipped <p../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug>boost_polygon.run for lack of <p../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug>boost_polygon...

Note: I didn’t set libs/geometry to my feature/persistency branch/fork yet.


When running above g++ command line directly strange things happen: 2 times without error last time with error:

130 henry@ubuntu:~/Develop/boost/mod/modular-boost/libs/geometry/test⟫     "g++"  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o" "geometries/boost_polygon.cpp"
...
henry@ubuntu:~/Develop/boost/mod/modular-boost/libs/geometry/test⟫ g++  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geomet
ry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o" "geometries/boost_polygon.cpp"                           henry@ubuntu:~/Develop/boost/mod/modular-boost/libs/geometry/test⟫ 
...
henry@ubuntu:~/Develop/boost/mod/modular-boost/libs/geometry/test⟫ g++  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o" "geometries/boost_polygon.cpp"
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

Am I the only one building & running geometry tests with g++ 4.8 on Ubuntu 14.04.1 LTS (Inside virtual machine)?

Kind regards,

Henry


On 9 dec. 2014, at 02:19, Adam Wulkiewicz <[hidden email]> wrote:

Hi Henry,

2014-12-09 2:00 GMT+01:00 Henry Roeland <[hidden email]>:

How do I test?:
Used source of boost 1.57.0 and added:
suggested checkout of boost variant with symbolic link from libs/variant/include/boost/variant* to boost/variant*
checkout of boost geometry develop (feature) branch with symbolic link from libs/geometry/include/boost/geometry* to boost/geometry*
NOTE: Renamed both original variant* and geometry* files and dirs and added checkouts


The errors you pasted aren't related to Variant.
You shouldn't rename anything or create links manually.
Have you done the steps mentioned here: https://svn.boost.org/trac/boost/wiki/TryModBoost ?

In particular:
git clone --recursive [hidden email]:boostorg/boost.git modular-boost
cd modular-boost
.\bootstrap
.\b2 headers
So cloned Boost with --recursive flag and run b2 headers. Then run the tests, e.g.:

b2 libs/geometry/test

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests

Barend
Hi Henry,

Henry Roeland wrote On 9-12-2014 7:20:
henry@ubuntu:~/Develop/boost/mod/modular-boost/libs/geometry/test⟫ g++  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o" "geometries/boost_polygon.cpp"
g++: internal compiler error: Killed (program cc1plus)


Can you check another testfile too? Because boost_polygon is specific because it depends on two Boost libraries (Boost.Geometry, Boost.Polygon). Please check if a simpler case (e.g. arithmetic\dot_product.cpp) works.


Am I the only one building & running geometry tests with g++ 4.8 on Ubuntu 14.04.1 LTS (Inside virtual machine)?

I don't have that configuration.

Regards, Barend


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests

Adam Wulkiewicz
Hi,

2014-12-09 8:13 GMT+01:00 Barend Gehrels <[hidden email]>:
Hi Henry,

Henry Roeland wrote On 9-12-2014 7:20:
henry@ubuntu:~/Develop/boost/mod/modular-boost/libs/geometry/test⟫ g++  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o" "geometries/boost_polygon.cpp"
g++: internal compiler error: Killed (program cc1plus)


Can you check another testfile too? Because boost_polygon is specific because it depends on two Boost libraries (Boost.Geometry, Boost.Polygon). Please check if a simpler case (e.g. arithmetic\dot_product.cpp) works.


Still, it shouldn't result in an internal compiler error.
 

Am I the only one building & running geometry tests with g++ 4.8 on Ubuntu 14.04.1 LTS (Inside virtual machine)?

I don't have that configuration.

I have Mint 17 based on Ubuntu 14.04 and GCC 4.8.2. I just tried to get a fresh clone, run the tests and it works for me. Is your compiler setup ok? Which standard library implementation are you using?

You could try to compile one of the simpler tests manually, something like this should work:

g++ -IPATH_TO_BOOST -IPATH_TO_BOOST/libs/geometry/test PATH_TO_BOOST/libs/geometry/test/arithmetic/dot_product.cpp

Regards,
Adam

_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests

Henry Roeland
Hi Barend and Adam,

Tried the tests at work on both Ubuntu 12.04 LTS/G++ 4.6  and Ubuntu 14.04 LTS/G++ 4.8.2 and both worked OK.

So it seems like a problem locally in my Virtual Machine. I’m using Parallels 10 on my macbook pro.

I’m testing the specs of the Virtual machine and found out that different memory/cpu specs matter:

SMALL: Running Ubuntu 14.04.1 LTS / G++ 4.8.2 with 1GB memory and 2 x 2.6 CPU’s ( My original configuration where trouble started):

...failed updating 5 targets...
...skipped 15 targets...
...updated 667 targets…


Few of the error message(s):
gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug/comparable_distance.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

    "g++"  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug/comparable_distance.o" "algorithms/comparable_distance.cpp"

...failed gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug/comparable_distance.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance for lack of <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance.run for lack of <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance…


gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/distance.test/gcc-4.8/debug/distance.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

    "g++"  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/algorithms/distance.test/gcc-4.8/debug/distance.o" "algorithms/distance.cpp"

...failed gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/distance.test/gcc-4.8/debug/distance.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/distance.test/gcc-4.8/debug>distance for lack of <p../../../bin.v2/libs/geometry/test/algorithms/distance.test/gcc-4.8/debug>distance.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/distance.test/gcc-4.8/debug>distance.run for lack of <p../../../bin.v2/libs/geometry/test/algorithms/distance.test/gcc-4.8/debug>distance…


gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/distance_linear_areal.test/gcc-4.8/debug/distance_linear_areal.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

    "g++"  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/algorithms/distance_linear_areal.test/gcc-4.8/debug/distance_linear_areal.o" "algorithms/distance_linear_areal.cpp"

...failed gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/distance_linear_areal.test/gcc-4.8/debug/distance_linear_areal.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/distance_linear_areal.test/gcc-4.8/debug>distance_linear_areal for lack of <p../../../bin.v2/libs/geometry/test/algorithms/distance_linear_areal.test/gcc-4.8/debug>distance_linear_areal.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/distance_linear_areal.test/gcc-4.8/debug>distance_linear_areal.run for lack of <p../../../bin.v2/libs/geometry/test/algorithms/distance_linear_areal.test/gcc-4.8/debug>distance_linear_areal...


MEDIUM: Running Ubuntu 14.04.1 LTS / G++ 4.8.2 with 1GB memory and 4 x 2.6 CPU’s:

...failed updating 1 target...
...skipped 3 targets...
...updated 683 targets...


The error message:
gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug/comparable_distance.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

    "g++"  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug/comparable_distance.o" "algorithms/comparable_distance.cpp"

...failed gcc.compile.c++ ../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug/comparable_distance.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance for lack of <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance.o...
...skipped <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance.run for lack of <p../../../bin.v2/libs/geometry/test/algorithms/comparable_distance.test/gcc-4.8/debug>comparable_distance...





LARGE: Running Ubuntu 14.04.1 LTS / G++ 4.8.2 with 2GB memory and 4 x 2.6 CPU’s:

...updated 687 targets...

So everything seems to work with this configuration!




Kind regards,

Henry Roeland




On 9 dec. 2014, at 14:08, Adam Wulkiewicz <[hidden email]> wrote:

Hi,

2014-12-09 8:13 GMT+01:00 Barend Gehrels <[hidden email]>:
Hi Henry,

Henry Roeland wrote On 9-12-2014 7:20:
henry@ubuntu:~/Develop/boost/mod/modular-boost/libs/geometry/test⟫ g++  -ftemplate-depth-128 -O0 -fno-inline -Wall -g -fPIC  -DBOOST_ALL_NO_LIB=1  -I"." -I"../../.." -I"../../../boost/geometry/extensions/contrib/ttmath" -c -o "../../../bin.v2/libs/geometry/test/geometries/boost_polygon.test/gcc-4.8/debug/boost_polygon.o" "geometries/boost_polygon.cpp"
g++: internal compiler error: Killed (program cc1plus)


Can you check another testfile too? Because boost_polygon is specific because it depends on two Boost libraries (Boost.Geometry, Boost.Polygon). Please check if a simpler case (e.g. arithmetic\dot_product.cpp) works.


Still, it shouldn't result in an internal compiler error.
 

Am I the only one building & running geometry tests with g++ 4.8 on Ubuntu 14.04.1 LTS (Inside virtual machine)?

I don't have that configuration.

I have Mint 17 based on Ubuntu 14.04 and GCC 4.8.2. I just tried to get a fresh clone, run the tests and it works for me. Is your compiler setup ok? Which standard library implementation are you using?

You could try to compile one of the simpler tests manually, something like this should work:

g++ -IPATH_TO_BOOST -IPATH_TO_BOOST/libs/geometry/test PATH_TO_BOOST/libs/geometry/test/arithmetic/dot_product.cpp

Regards,
Adam
_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry
Reply | Threaded
Open this post in threaded view
|

Re: Forking and compiling tests

Tim Finer
Hello Henry,

On 12/9/14, 1:04 PM, Henry Roeland wrote:
Hi Barend and Adam,

Tried the tests at work on both Ubuntu 12.04 LTS/G++ 4.6  and Ubuntu 14.04 LTS/G++ 4.8.2 and both worked OK.

So it seems like a problem locally in my Virtual Machine. I’m using Parallels 10 on my macbook pro.

I’m testing the specs of the Virtual machine and found out that different memory/cpu specs matter:

SMALL: Running Ubuntu 14.04.1 LTS / G++ 4.8.2 with 1GB memory and 2 x 2.6 CPU’s ( My original configuration where trouble started):

...failed updating 5 targets...
...skipped 15 targets...
...updated 667 targets…

<snipped for brevity>
LARGE: Running Ubuntu 14.04.1 LTS / G++ 4.8.2 with 2GB memory and 4 x 2.6 CPU’s:

...updated 687 targets...

So everything seems to work with this configuration!

It sounds like gcc is running out of memory. Template heavy code eats up a lot of memory & CPU.  If you are interested you will probably see a large spike in memory usage right before the internal errors start happening.

Best,
Tim


_______________________________________________
Geometry mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/geometry