sharing large objects with workers

#1

Hi,

I’m very new to Crossbar and Python and quite amazed on what could be done with it. Apologies if I’m asking the question that was asked before, but I tried to use google as much as I can to get an answer to no avail.

Here is my set up - I have a large read-only object that is used as basis for calculations for my workers. I would like to be able to share that object with all of them, there is no need to synchronize or do anything multi-core with it since all workers are using it read only - they just need to wait before it loads everything, I don’t want to use database to keep my blob because it is too time consuming and I have plenty of RAM to accomodate it. I’m using Linux for my development.

So how can I make this large in-memory “blob” be available to workers ?

  1. Have all workers as “guests” that are spawned by a different process -> load large blob in the memory, fork and then register each worker with crossbar. As long as blob is indeed read only, linux should be able share it withe everyone without copying it to each process. Will have to coordinate launching guests and node.

  2. Use memory mapped files, but deal with more basic data structures instead of full python on the “blob”. All workers could be native in Crossbar.io, data sharing is dealt by memory mapped files.

I’m wondering if anyone had any experience with this or if there are some examples that I have missed.

Best Regards,

Basil

0 Likes

#2

Hi,

Hi,

I'm very new to Crossbar and Python and quite amazed on what could be done
with it. Apologies if I'm asking the question that was asked before, but I
tried to use google as much as I can to get an answer to no avail.

Here is my set up - I have a large read-only object that is used as basis

Is this a native Python object, or more like a serialized matrix or binary data?

for calculations for my workers. I would like to be able to share that
object with all of them, there is no need to synchronize or do anything
multi-core with it since all workers are using it read only - they just
need to wait before it loads everything, I don't want to use database to
keep my blob because it is too time consuming and I have plenty of RAM to
accomodate it. I'm using Linux for my development.

So how can I make this large in-memory "blob" be available to workers ?

1. Have all workers as "guests" that are spawned by a different process ->
load large blob in the memory, fork and then register each worker with
crossbar. As long as blob is indeed read only, linux should be able share
it withe everyone without copying it to each process. Will have to
coordinate launching guests and node.

yeah, complex;)

2. Use memory mapped files, but deal with more basic data structures
instead of full python on the "blob". All workers could be native in
Crossbar.io, data sharing is dealt by memory mapped files.

yep, that is what I'd do probably.

you can share stuff via mmap, or share via an embedded mmap'ed DB like LMDB.

the main problem is: you can't have native Python objects in there.

In fact, I don't know any way of sharing native Python objects between different Python processes / interpreters.

so you might need something like protobuf or better:

https://capnproto.org/

and this

http://capnpy.readthedocs.io/en/latest/

(which runs fast on pypy)

Cheers,
/Tobias

···

Am 27.02.2017 um 21:40 schrieb bpup...@gmail.com:

I'm wondering if anyone had any experience with this or if there are some
examples that I have missed.

Best Regards,
Basil

0 Likes