crossbar.io production readiness

#1

Hi,

We are considering building our system with WAMP (migrating from the good old HTTP stack). While we are excited about the technology, since we will need to go live in a few months, am wondering the production readiness, a few questions below:

crossbar.io router obviously is a key piece in a distributed system, in order for it not to be a SPOF (single-point-of-failure), there should be a cross-host cluster that provides auto-recovery/fallback so when one instance dies, the other will take over. I see clustering is indeed mention in the architecture document but I can’t find any example for it, and it seems to be marked “planned feature”, so wondering if it is available today? If not, is there a date for this feature?

If we need to deploy before such cluster technology, what is the alternative architecture suggestion? can we utilize the existing load-balancer, such as amazon’s elastic load-balancer? Basically deploy many “local crossbar.io/containers” and let the load-balancer route the traffic on top? Any information on this front will greatly help us plan the production deployment strategy short-term/long-term.

Is there any performance info for the router, such as throughput ? As far as you know, is there any mid - large scale production system using crossbar.io (or any other WAMP routers) today?

Thanks.

0 Likes

#2

Hi,

regarding “production readiness”:

You are not alone … there are multiple people/companies currently investigating/building stuff on top of WAMP / Crossbar.io in private. Some of them don’t want to even disclose / talk yet. I guess the current situation (people unsure about viability of this tech) is unavoidable … but transient.

Regarding routing performance: the only hard numbers we currently have is here http://tavendo.com/blog/post/autobahn-pi-benchmark/
In general I’d expect a single instance of Crossbar.io (running a router on a single thread) to scale to 100-200k concurrent connections.
What is a “mid - large scale production system”? Volume of connections/messages?

Note that by using worker processes for your WAMP app components (all connected to a single router), you can already scale up/out the app logic today. What you cannot scale up/out yet is the routing core of Crossbar.io itself.

This feature (router-to-router clustering/federation) will come to Crossbar.io - it seems a lot of people are waiting for this. It’ll arrive “in the coming months”. I am sorry I can’t be more specific.

From a practical point of view, here is what you can do today for HA:

  • Have a hot-standby Crossbar.io instance (that is one already running, but with no clients connected).
  • When the primary fails, failover to the former (either using a LB, or by taking over the IP of the former)
  • When the former fails, all clients (both frontend and backend components) will loose their connections
  • All clients will (should) automatically reconnect (as e.g. AutobahnJS does), and hence connect to the standby

Note that for above to work, your backend components will need to connect via the LB as the frontend components, so that the LB can forward the connection to the standby upon reconnection. An issue with this might be when frontend components connect faster than backend components (which then won’t be e.g. callable from frontends yet).

Another note: we are not planning to failover an established WAMP session from Crossbar.io instance 1 to 2. This would be really complex for various reasonbs. Instead, we will rely on WAMP clients autoreconnecting - which at least frontend clients will need to do anyway (think mobile networks with intermittent connectivity).

Please ask again if above is insufficient info for you to go forward. It’s exciting to see more and more people joining in … I’d wish we would have a “full story” for all the valid requests people have. Not there quite yet;)

Cheers,
/Tobias

···

Am Dienstag, 23. Dezember 2014 22:21:51 UTC+1 schrieb paradox7:

Hi,

We are considering building our system with WAMP (migrating from the good old HTTP stack). While we are excited about the technology, since we will need to go live in a few months, am wondering the production readiness, a few questions below:

crossbar.io router obviously is a key piece in a distributed system, in order for it not to be a SPOF (single-point-of-failure), there should be a cross-host cluster that provides auto-recovery/fallback so when one instance dies, the other will take over. I see clustering is indeed mention in the architecture document but I can’t find any example for it, and it seems to be marked “planned feature”, so wondering if it is available today? If not, is there a date for this feature?

If we need to deploy before such cluster technology, what is the alternative architecture suggestion? can we utilize the existing load-balancer, such as amazon’s elastic load-balancer? Basically deploy many “local crossbar.io/containers” and let the load-balancer route the traffic on top? Any information on this front will greatly help us plan the production deployment strategy short-term/long-term.

Is there any performance info for the router, such as throughput ? As far as you know, is there any mid - large scale production system using crossbar.io (or any other WAMP routers) today?

Thanks.

0 Likes

#3

Thanks Tobias for detailed reply.

… Note that by using worker processes for your WAMP app components (all connected to a single router), you can already scale up/out the app logic today.

here is more concrete ask:

  1. scale-out: app logic containers need to be able to run collapsed (same host as router) as well as in separate hosts. Can crossbar.io manage workers on different hosts?
  2. scale-up: need multiple instances for each app container, for availability as well as throughput scaling.
  • How do we do this with RPC registration? the router will not allow redundant endpoint registration, does it?
  • How do we do this with subscription that only 1 process within this “redundant processes group” get to process the given message.
  • Can crossbar.io hold connection with the container through a container-side proxy (i.e. a load balancer fronting the containers)? Or maybe crossbar.io has a concept of “worker group” that only 1 instance of such workers will be selected to process the incoming message ?

From a practical point of view, here is what you can do today for HA:

  • Have a hot-standby Crossbar.io instance (that is one already running, but with no clients connected).
  • When the primary fails, failover to the former (either using a LB, or by taking over the IP of the former)
  • When the former fails, all clients (both frontend and backend components) will loose their connections
  • All clients will (should) automatically reconnect (as e.g. AutobahnJS does), and hence connect to the standby

For short term, this is acceptable, provided it didn’t happen often :wink:

Another note: we are not planning to failover an established WAMP session from Crossbar.io instance 1 to 2. This would be really complex for various reasonbs. Instead, we will rely on WAMP clients autoreconnecting - which at least frontend clients will need to do anyway (think mobile networks with intermittent connectivity).

agree.

···

On Tuesday, December 30, 2014 8:00:16 AM UTC-5, Tobias Oberstein wrote:

Hi,

regarding “production readiness”:

You are not alone … there are multiple people/companies currently investigating/building stuff on top of WAMP / Crossbar.io in private. Some of them don’t want to even disclose / talk yet. I guess the current situation (people unsure about viability of this tech) is unavoidable … but transient.

Regarding routing performance: the only hard numbers we currently have is here http://tavendo.com/blog/post/autobahn-pi-benchmark/
In general I’d expect a single instance of Crossbar.io (running a router on a single thread) to scale to 100-200k concurrent connections.
What is a “mid - large scale production system”? Volume of connections/messages?

Note that by using worker processes for your WAMP app components (all connected to a single router), you can already scale up/out the app logic today. What you cannot scale up/out yet is the routing core of Crossbar.io itself.

This feature (router-to-router clustering/federation) will come to Crossbar.io - it seems a lot of people are waiting for this. It’ll arrive “in the coming months”. I am sorry I can’t be more specific.

From a practical point of view, here is what you can do today for HA:

  • Have a hot-standby Crossbar.io instance (that is one already running, but with no clients connected).
  • When the primary fails, failover to the former (either using a LB, or by taking over the IP of the former)
  • When the former fails, all clients (both frontend and backend components) will loose their connections
  • All clients will (should) automatically reconnect (as e.g. AutobahnJS does), and hence connect to the standby

Note that for above to work, your backend components will need to connect via the LB as the frontend components, so that the LB can forward the connection to the standby upon reconnection. An issue with this might be when frontend components connect faster than backend components (which then won’t be e.g. callable from frontends yet).

Another note: we are not planning to failover an established WAMP session from Crossbar.io instance 1 to 2. This would be really complex for various reasonbs. Instead, we will rely on WAMP clients autoreconnecting - which at least frontend clients will need to do anyway (think mobile networks with intermittent connectivity).

Please ask again if above is insufficient info for you to go forward. It’s exciting to see more and more people joining in … I’d wish we would have a “full story” for all the valid requests people have. Not there quite yet;)

Cheers,
/Tobias

Am Dienstag, 23. Dezember 2014 22:21:51 UTC+1 schrieb paradox7:

Hi,

We are considering building our system with WAMP (migrating from the good old HTTP stack). While we are excited about the technology, since we will need to go live in a few months, am wondering the production readiness, a few questions below:

crossbar.io router obviously is a key piece in a distributed system, in order for it not to be a SPOF (single-point-of-failure), there should be a cross-host cluster that provides auto-recovery/fallback so when one instance dies, the other will take over. I see clustering is indeed mention in the architecture document but I can’t find any example for it, and it seems to be marked “planned feature”, so wondering if it is available today? If not, is there a date for this feature?

If we need to deploy before such cluster technology, what is the alternative architecture suggestion? can we utilize the existing load-balancer, such as amazon’s elastic load-balancer? Basically deploy many “local crossbar.io/containers” and let the load-balancer route the traffic on top? Any information on this front will greatly help us plan the production deployment strategy short-term/long-term.

Is there any performance info for the router, such as throughput ? As far as you know, is there any mid - large scale production system using crossbar.io (or any other WAMP routers) today?

Thanks.

0 Likes

#4

Hi,

here is more concrete ask:

1. scale-out: app logic containers need to be able to run collapsed
    (same host as router) as well as in separate hosts. Can crossbar.io
    manage workers on different hosts?

A Crossbar.io instance can only start workers locally.

This allows you to scale up your app logic (that is run the logic on multiple core on the local machine) by starting more workers managed from Crossbar.io

However, a Crossbar.io instance can (obviously) accept incoming WAMP connections from app components from anywhere. This is what allows scale-out your app logic today - run the components on other hosts.

Those connecting clients won't be managed/monitored by Crossbar.io as workers then though.

Here is where we want to go:

Have a cluster/federation of Crossbar.io instances where you can start workers transparently on any of the nodes. Or have Crossbar.io automatically make placement decisions (like fire up a worker on a least loaded host). Or have Crossbar.io fire up a worker in a OS container (think Docker). Etc.

This latter stuff is pointing to an exciting perspective: Crossbar.io as a complete microsservice platform.

Would be interesting to me where you actually want to go with Crossbar.io in your app/project ...

2. scale-up: need multiple instances for each app container, for
    availability as well as throughput scaling.
      * How do we do this with RPC registration? the router will not
        allow redundant endpoint registration, does it?

Again, in the pipeline: the WAMP "Advanced Profile" talks about this under the term "distributed/partitioned RPC/PubSub".

Regarding HA-Callees: https://github.com/tavendo/WAMP/issues/89

      * How do we do this with subscription that only 1 process within
        this "redundant processes group" get to process the given message.

Not sure what you mean here .. can you expand on the behavior you envision?

      * Can crossbar.io hold connection with the container through a
        container-side proxy (i.e. a load balancer fronting the
        containers)? Or maybe crossbar.io has a concept of "worker
        group" that only 1 instance of such workers will be selected to
        process the incoming message ?

Kind of the latter. Crossbar.io will implement different policies, like round-robin, random, .. for directing e.g. a specific call to respective endpoints (callees). There is no LB involved. Crossbar.io is a WAMP-level LB.

     From a practical point of view, here is what you can do today for HA:

    - Have a hot-standby Crossbar.io instance (that is one already
    running, but with _no_ clients connected).
    - When the primary fails, failover to the former (either using a LB,
    or by taking over the IP of the former)
    - When the former fails, all clients (both frontend and backend
    components) will loose their connections
    - All clients will (should) automatically reconnect (as e.g.
    AutobahnJS does), and hence connect to the standby

For short term, this is acceptable, provided it didn't happen often :wink:

FWIW, I haven't seen our public Crossbar.io instance (which runs on EC2) collapse even once. It's restarted after Crossbar.io or OS upgrades. That's it.

Cheers,
/Tobias

0 Likes

#5

Yes, looks like “distributed/partitioned RPC/PubSub” could solve the scalability and availability issues we have. However the spec is not yet stable/finalized, and I agree the router needs to tackle its own availability and scalability first (clustering for example) before scaling the workers according to the the spec above… I wonder if there is a quick win by leveraging existing proven infrastructure to scale workers, that it will not only ease the early adopters concern, also buy precious time/experience to take crossbar.io to the next level…

It will be really cool and immediately make crossbar.io production-ready for us, if it can communicate with the workers through LB proxy fronting app workers. For example Amazon Elastic Load Balancer (ELB) seems to support WebSocket via TCP protocol (http://blog.flux7.com/web-apps-websockets-with-aws-elastic-load-balancing). If we have 2 app workers behind ELB, all registered for the same endpoint, crossbar.io will probably reject the 2nd registrations attempt, but as long as it route messages to ELB based on the first registration, we will be ok since ELB will take care of the load-balancing, distributed process management etc. It seems that crossbar.io should be able to do this today, am I missing something?

To sum up, we don’t mind managing the workers scalability ourselves via existing LB technology; we don’t mind occasional manual fallback with crossbar.io; we can live with single crossbar.io throughput limit for the next few months. if crossbar.io can function under these assumptions, then we have a winner!

···

On Tuesday, December 30, 2014 10:30:01 AM UTC-5, Tobias Oberstein wrote:

Hi,

here is more concrete ask:

  1. scale-out: app logic containers need to be able to run collapsed
(same host as router) as well as in separate hosts. Can [crossbar.io](http://crossbar.io)
manage workers on different hosts?

A Crossbar.io instance can only start workers locally.

This allows you to scale up your app logic (that is run the logic on
multiple core on the local machine) by starting more workers managed
from Crossbar.io

However, a Crossbar.io instance can (obviously) accept incoming WAMP
connections from app components from anywhere. This is what allows
scale-out your app logic today - run the components on other hosts.

Those connecting clients won’t be managed/monitored by Crossbar.io as
workers then though.

Here is where we want to go:

Have a cluster/federation of Crossbar.io instances where you can start
workers transparently on any of the nodes. Or have Crossbar.io
automatically make placement decisions (like fire up a worker on a least
loaded host). Or have Crossbar.io fire up a worker in a OS container
(think Docker). Etc.

This latter stuff is pointing to an exciting perspective: Crossbar.io as
a complete microsservice platform.

Would be interesting to me where you actually want to go with
Crossbar.io in your app/project …

  1. scale-up: need multiple instances for each app container, for
availability as well as throughput scaling.
  * How do we do this with RPC registration? the router will not
    allow redundant endpoint registration, does it?

Again, in the pipeline: the WAMP “Advanced Profile” talks about this
under the term “distributed/partitioned RPC/PubSub”.

Regarding HA-Callees: https://github.com/tavendo/WAMP/issues/89

  * How do we do this with subscription that only 1 process within
    this "redundant processes group" get to process the given message.

Not sure what you mean here … can you expand on the behavior you envision?

  * Can [crossbar.io](http://crossbar.io) hold connection with the container through a
    container-side proxy (i.e. a load balancer fronting the
    containers)? Or maybe [crossbar.io](http://crossbar.io) has a concept of "worker
    group" that only 1 instance of such workers will be selected to
    process the incoming message ?

Kind of the latter. Crossbar.io will implement different policies, like
round-robin, random, … for directing e.g. a specific call to respective
endpoints (callees). There is no LB involved. Crossbar.io is a
WAMP-level LB.

 From a practical point of view, here is what you can do today for HA:
- Have a hot-standby Crossbar.io instance (that is one already
running, but with _no_ clients connected).
- When the primary fails, failover to the former (either using a LB,
or by taking over the IP of the former)
- When the former fails, all clients (both frontend and backend
components) will loose their connections
- All clients will (should) automatically reconnect (as e.g.
AutobahnJS does), and hence connect to the standby

For short term, this is acceptable, provided it didn’t happen often :wink:

FWIW, I haven’t seen our public Crossbar.io instance (which runs on EC2)
collapse even once. It’s restarted after Crossbar.io or OS upgrades.
That’s it.

Cheers,

/Tobias

0 Likes

#6

Hi,

Yes, looks like "distributed/partitioned RPC/PubSub" could solve the
scalability and availability issues we have. However the spec is not yet
stable/finalized, and I agree the router needs to tackle its own
availability and scalability first (clustering for example) before
scaling the workers according to the the spec above... I wonder if there

We need both, yes.

But they are orthogonal, and the distributed/partitioned stuff is definitely easier.

Allowing Crossbar.io to maintain multiple endpoints for a given procedure, and then randomly select one when a call comes in is straight forward. I have avoided to add this point feature as a quick hack since I wanted to get the "big picture" conceptually right .. which then includes stuff like "partitioned RPC" etc.

If we have 2 app workers behind ELB, all registered for the same
endpoint, crossbar.io will probably reject the 2nd registrations
attempt, but as long as it route messages to ELB based on the first
registration, we will be ok since ELB will take care of the
load-balancing, distributed process management etc. It seems that
crossbar.io should be able to do this today, am I missing something?

Unfortunately, yes.

Workers are not _listening_ for WebSocket connections coming in from a WAMP router (and possibly distributed by a LB), but _connecting_ over as WebSocket connection to a router. And since we don't yet have router-to-router clustering, all those workers will need to connect to 1 router. And that router does not yet allow for 2 workers to register the same procedure.

To sum up, we don't mind managing the workers scalability ourselves via
existing LB technology; we don't mind occasional manual fallback with

> crossbar.io; we can live with single crossbar.io throughput limit for
> the next few months. if crossbar.io can function under these
> assumptions, then we have a winner!

If this is a deal breaker for you ("lack of worker load-balancing" for workers registering same procedures), we might implement that point feature quicker - you just need to convince me that your project will be awesome and push Crossbar.io to reprio my endless work queue :wink:

Cheers,
/Tobias

···

Am 30.12.2014 um 22:38 schrieb paradox7:

0 Likes

#7

Sorry for the long delay…

If this is a deal breaker for you (“lack of worker load-balancing” for
workers registering same procedures), we might implement that point
feature quicker…

Yes, it will be a deal breaker when we go to production. The infrastructure has to be capable of scaling beyond single-host-per-component. I looked at the new crossbar roadmap, the whole scalability related features are currently scheduled for release 3 (Aug 2015), which is too late for us. It will be really great if crossbar can consider a more incremental approach to implement this low-hanging fruit earlier so more early adapters can scale to be great real-world WAMP systems.

···

On Wednesday, December 31, 2014 at 6:59:58 PM UTC+8, Tobias Oberstein wrote:

Hi,

Am 30.12.2014 um 22:38 schrieb paradox7:

Yes, looks like “distributed/partitioned RPC/PubSub” could solve the

scalability and availability issues we have. However the spec is not yet

stable/finalized, and I agree the router needs to tackle its own

availability and scalability first (clustering for example) before

scaling the workers according to the the spec above… I wonder if there

We need both, yes.

But they are orthogonal, and the distributed/partitioned stuff is
definitely easier.

Allowing Crossbar.io to maintain multiple endpoints for a given
procedure, and then randomly select one when a call comes in is straight
forward. I have avoided to add this point feature as a quick hack since
I wanted to get the “big picture” conceptually right … which then
includes stuff like “partitioned RPC” etc.

If we have 2 app workers behind ELB, all registered for the same

endpoint, crossbar.io will probably reject the 2nd registrations

attempt, but as long as it route messages to ELB based on the first

registration, we will be ok since ELB will take care of the

load-balancing, distributed process management etc. It seems that

crossbar.io should be able to do this today, am I missing something?

Unfortunately, yes.

Workers are not listening for WebSocket connections coming in from a
WAMP router (and possibly distributed by a LB), but connecting over as
WebSocket connection to a router. And since we don’t yet have
router-to-router clustering, all those workers will need to connect to 1
router. And that router does not yet allow for 2 workers to register the
same procedure.

To sum up, we don’t mind managing the workers scalability ourselves via

existing LB technology; we don’t mind occasional manual fallback with

crossbar.io; we can live with single crossbar.io throughput limit for

the next few months. if crossbar.io can function under these

assumptions, then we have a winner!

If this is a deal breaker for you (“lack of worker load-balancing” for
workers registering same procedures), we might implement that point
feature quicker - you just need to convince me that your project will be
awesome and push Crossbar.io to reprio my endless work queue :wink:

Cheers,

/Tobias

0 Likes