Multiple node routing using prefixes

Hello, I can see router linking gets discussed a lot here but it’s usually about performance or HA, so I’m hoping this is a unique discussion as neither are an issue in my situation.

The wider networks that I’m running clients on are often unstable so I’d like to reduce the connections we need to monitor by creating a single level star network of WAMP routers. Many clients would connect to a local node which maintains a single connection with the central node to pass messages back & forth. The routing logic would be very simple, just using an identifier of each local node as a prefix.

I’m currently looking at implementing this by writing a router component to run on each local node which registers prefix matches on both the local and central router. It would then repeat the calls and publish messages on the other node.

Is creating this client to bridge the nodes the best way to do this kind of prefix-routing right now? Is anyone aware of any prior work that may have been done on this problem?

Thanks!

Hi there,

Many clients would connect to a local node which maintains a single connection with the central node to pass messages back & forth.

that’s exactly what rlinks (“router-to-router links”) do. well, plus more.

rlinks form the basis to create “clusters”, which then are just a set of nodes with rlinks in a defined topology, and with consistent authorization rules (permissions in the WAMP URI space).

That orchestration and automatic management is a feature of our commercial version Crossbar.io FX, but the underlying basis at the WAMP transport level (that is rlinks) is OSS.


finally, rlinks are specifically designed to work with WAN topologies, eg where some nodes sit behind NATs or such, and might only be able to do outgoing network connections to their upstream node in eg the cloud, forming a tree topology.

rlinks are quite newish, but I would call it “beta” now (we did a bunch of alpha iterations already), and we’d love to get user feedback and more exposure and testing!

Cheers,
/Tobias

Cheers,
/Tobias

Thanks for the information and helpful links Tobias!

I’ve tried implementing rlinks but I’m not sure which of my issues I’m facing are bugs or expected behaviour. Could you help me out? The only showstopper is my final point where registered methods are not recoverable after an outage.

I’m testing with a two node, one client setup each running in its own docker container:

Central CB <-- Local CB <-- Client AB

I have a “Central” crossbar node, which accepts an rlink from a “Local” crossbar node. Plus an autobahn python “Client” connected to Local

When Local starts up:
1. A number of wamp.* calls are made from Local to Central which can get blocked by custom authorizers. Is there a list of what should be accepted by custom authorizers here?

2. Local tries to re-register Central’s custom authenticators and authorizers on Central (and fails). Why? :confused:

When the Client registers a method via Local:
3. Local appears to register the method twice and fails the second time due to conflict, possibly it’s replicating its own registration back from Central?

When Central stops/dies:
4. Client remains connected to Local and does not see any connection issues, is that expected?

When Central recovers:
5. Local reconnects but does NOT re-register registrations made by Client previously. How should the system recover after an outage of Central?

Crossbar configs for reference:

Central config.json
{
    "version": 2,
    "controller": {
        "id": "central"
    },
    "workers": [
        {
            "type": "router",

            "options": {
                "pythonpath": [
                    ".."
                ]
            },
            "components": [
                {
                    "type": "class",
                    "classname": "authenticate.Authenticator",
                    "realm": "office",
                    "role": "central"
                },
                {
                    "type": "class",
                    "classname": "authorize.Authorizer",
                    "realm": "office",
                    "role": "central"
                }
            ],
            "realms": [
                {
                    "name": "office",
                    "roles": [
                        {
                            "name": "central",
                            "permissions": [
                                {
                                    "uri": "central.",
                                    "match": "prefix",
                                    "allow": {
                                        "register": true
                                    }
                                }
                            ]
                        },
                        {
                            "name": "local",
                            "authorizer": "central.authorize.local"
                        },
                        {
                            "name": "client",
                            "authorizer": "central.authorize.client"
                        }
                    ]
                }
            ],
            "transports": [
                {
                    "type": "web",
                    "endpoint": {
                        "type": "tcp",
                        "port": 8070
                    },
                    "paths": {
                        "rlink": {
                            "type": "websocket",
                            "auth": {
                                "cryptosign": {
                                    "type": "dynamic",
                                    "authenticator": "central.authenticate.rlink",
                                    "authenticator-realm": "office"
                                }
                            }
                        },
                        "client": {
                            "type": "websocket",
                            "auth": {
                                "anonymous": {
                                    "type": "dynamic",
                                    "authenticator": "central.authenticate.client",
                                    "authenticator-realm": "office"
                                }
                            }
                        }
                    }
                }
            ]
        }
    ]
}
Local config.json
{
    "version": 2,
    "controller": {
        "id": "local"
    },
    "workers": [
        {
            "type": "router",
            "realms": [
                {
                    "name": "office",
                    "rlinks": [
                        {
                            "id": "rlink_local_central",
                            "authid": "local",
                            "realm": "office",
                            "transport": {
                                "type": "websocket",
                                "endpoint": {
                                    "type": "tcp",
                                    "host": "central",
                                    "port": 80
                                },
                                "url": "ws://central/rlink"
                            }
                        }
                    ],
                    "roles": [
                        {
                            "name": "client",
                            "authorizer": "central.authorize.client"
                        }
                    ]
                }
            ],
            "transports": [
                {
                    "type": "websocket",
                    "endpoint": {
                        "type": "unix",
                        "path": "/host/socket"
                    },
                    "auth": {
                        "anonymous": {
                            "type": "static",
                            "role": "client"
                        }
                    }
                }
            ]
        }
    ]
}

@oberstet I wanted to check again on my last point: Are clients connected to an rlinked node able to recover their registered methods when the upstream node is rebooted? I can’t see much practical use for rlinks if not, maybe I’ve misinterpreted how the rlinks should be configured?

Thanks again for your help so far

Hi @dacrsh,

Clients have to be able to handle re-connections. We do not provide any mechanism to keep TCP connections alive across machine-reboots etc and anyway TCP connections can fail for many other reasons too. Upon re-connection, you have to re-establish subscriptions and registrations.

So if the node you’re connected to goes down (reboots, whatever) your client re-connects and will reach a different node. It will then re-establish its registrations. So, no, the registrations aren’t “reserved” or kept around and Calls to them are not buffered or kept.

The main point of rlinks is availability and scaling; multiple computers/processes can handle a single realm and so unless ALL the computers go down that realm will be “available”. If a single computer can’t handle all the clients, rlinked nodes can “scale” that realm.

Are there different ways to use rlinks here? I’m a bit confused because what you’re describing is a horizontal scaling cloud type of situation whereas I’m going for a hierarchy up to a central node (not designed to be high availability). The examples repo has both:
Hierarchical: https://github.com/crossbario/crossbar-examples/blob/master/rlinks/_work
High Availability: https://github.com/crossbario/crossbar-examples/tree/master/rlinks/ha_setup

I assumed hierarchical was supported from Tobias’ message:

In my case I don’t have an rlink definition pointing from Central back to Local. When a router upstream to the one a client is connected to goes down and then rejoins it seems like the client has no way of knowing that this has occurred (and therefore no way to re-register).

Presumably when set up with symmetrical rlink definitions between all nodes all existing registrations are downloaded when it rejoins? This is not the case in a one-way hierarchical setup.

Sorry, I think I misunderstood your question. When an actual rlink connection is established (including “re-established”) then (depending on options) all “local” registrations are mirrored on the “other” side of the rlink. Clients don’t do anything in this case.

1 Like

@dacrsh expanding on what Meejah already pointed out, there are two types of connections involved (rlinks and uplinks):

  • when a router looses one of its rlinks to another router, and that connection is reestablished, the registrations/subscriptions that exist on the reconnecting node are reestablished over the rlink. (same can happen in the other direction when the rlink was configured bidirectionally).
  • when a client looses its uplink connection to a router, and that connection is reestablished, its the client business to restablish its own registrations/subscriptions. from the router point of view, the client just disappeared without saying goodbye before rather than orderly closing the session and tearing down the transport

maybe that also helps?

Based from what’s been discussed here I think I understand the technical reasons but I’m still seeing a fundamental issue when using this as an uplink rather than bidirectional rlinks. Since existing registrations are only mirrored one way on connection (yet new registrations propagate upstream as well).

Taking this as a new example: https://crossbario.com/static/img/gen/cb_commercial_paths.svg
and assuming the edge routers are behind NATs, therefore can only connect unidirectionally to master.

If the master router/cluster goes down and then recovers while edge routers stay up we have an unrecoverable situation:

The good:

  • Devices connected directly to master will reconnect and re-register
  • Edge routers will reconnect to master and mirror those registrations from master

The bad:

  • Since master does not connect back to edge routers it will not mirror existing edge registrations
  • Devices behind edge routers were never notified of any issues so cannot know to re-register

couple of quick notes:

  • the master node is not required in mirroring subscriptions/registrations
  • the management uplink to the master node are different from router-to-router links
  • router-to-router links are established unidirectionally (someone has to initiate opening a TCP connection to somewhere), but once established can work bidirectionally (mirror registrations/subscriptions in both directions)
  • edge nodes keep running (with their current configuration state) even when loosing the management uplink to master
  • edge nodes will just keep retrying to reconnect to master, and once master comes back, the master node will then verify the actually running configuration state of edge nodes vs the expected state

but I’m still seeing a fundamental issue

we can have a look when you can provide with a test case so we can reproduce any issue you see …