I want to run tasks in the background using a worker pool. See a previous response I made to another poster about this:
To do this today, I’m using beanstalkd as a task queue and I inject jobs into this queue in various ways but most often through crontab. The worker processes are then managed by supervisor in a pool of about 10 workers and these workers each wait for jobs on the task queue, pop one off, processes it then exit. Next, a new/fresh worker gets spawned by supervisor and that new worker waits for jobs. I’ve been using the concept of having workers die and respawn in order to ensure that memory is cleared and possible new code is reread from the filesystem. This may no longer be necessary as it appears these workers are quite stable these days.
I’m torn between wanting one powerful app vs several smaller apps that work together. Although the combo of cron + crossbar + supervisord + beanstalkd are all “working” right now, I’d like a cleaner and simpler design if possible. As my architecture and APIs are growing, I want fewer moving parts and less complexity.
So, I came across this documentation on scaling micro services with crossbar:
It looks like crossbar will allow me to register multiple clients for the same RPC calls and I can configure:
These features sound like possible replacements for beanstalkd?! So, here are some of my thoughts:
if I write a client in PHP using thruway, I’m not sure if it will allow concurrent RPC calls to the same client … meaning I don’t know if I can run calls in parallel or if PHP being single-threaded will end up serializing all the calls internally. I think because they are using reactphp it SHOULD allow calls in parallel to the same client … but I have to test and experiment with this … if any of you have experience, you might save me a lot of time and trouble. For instance, if I have an RPC call named “generate_complex_report” and this function takes 100 seconds to run (a lot of I/O wait) … if I call this function from crossbar 3 times rapidly, will it take 300 seconds to complete all 3 calls or will they run in parallel and only take 100 seconds?
Let’s say I’m wanting to run up to 10 workers in parallel … should I be connecting 10 separate clients with shared registrations and concurrency=1 or 1 client without shared registrations and concurrency=10?
when jobs are running from my generic workers through supervisord, in order to call specific functions, I have to pass the function names and parameters into the worker using the beanstalkd queue. This isn’t very difficult since I just populate a JSON struct and insert into the queue and pop off the queue and fetch out the same JSON struct. But as I do this work, I feel very silly because the ‘function name’ + args feels very much like WAMP protocol and … it feels like reinventing something that already exists inside crossbar! Additionally, I end up with a split architecture where I have my “worker API” separated from my “crossbar api” … and I’d like to marry these 2 back under the same single umbrella.
If I just want a worker to process something in the background and I don’t need to wait for the results, with crossbar, how can I detach from a process? In my crontab scenario, I have some CLI tools that ‘inject’ jobs into the queue at scheduled intervals. These injections could be replaced by WAMP rpc calls that get detached and are allowed to be call queued.
So, before I run off and try to build all of this under crossbar using the micro services feature as described in the link above, can some of you comment on my thoughts above and tell me if I have the right idea or if this isn’t really what’s intended for this feature?
Some background jobs I’d be running are something like …
- import data from 500+ remote locations every 5 minutes
- run data aggregation for reporting
- export data and send to 3rd party at given intervals
- test remote location connectivity
For importing data, let’s imagine we have a crossbar RPC call like:
and a wrapper rpc call that might do all the locations at once:
Now, in cron, we connect to crossbar and call ‘import_all_locations’ and that function fetches a list of the locations from database and for each location calls import_data(location_id) with the given location_id. Suddenly, I’ve just injected 500+ API calls into the crossbar system. I don’t really need to wait for the output of these functions because I can just have each one publish a message to a given topic once they complete or fail. Is this “detaching” from a background process possible in crossbar? Can I use it this way?