Throttling new connections (prevent router overload)

#1 router handles lots of concurrent (websocket) connections w/o any issues once they are established and register/subscribe. (Especially with batched pings).
But when router is restarted, or all clients are reconnecting (i.e. after an update) router chokes when new connection rate > 500/s.

Once the connection is established, router and client exchange series of messages (Hello, Challenge, Authenticate, Register, Subscribe)…

reactor gets clogged, CPU Core utilization goes to 100%, clients time out… start dropping out which further clogs the reactor. By the time it gets to process next message from some client, it has already dropped out / disconnected.

(WAMP-CRA adds insult to injury with CPU intensive SHA-256 calls).

Tried limiting new connections with iptables , has positive effect, but does not solve the problem completely

iptables -I INPUT -p tcp --dport 8080 -m state --state NEW -m limit --limit 500/s -j ACCEPT

More graceful solution would be to return 503 to all new connections when router is already too busy. I’m struggling to find the best metric to determine whether to accept new connection or reject it with

raise ConnectionDeny(ConnectionDeny.SERVICE_UNAVAILABLE, u'Server is busy. Retry in {}s'.format(**possibleAvailabilityETA**))

somewhere in WampWebSocketServerProtocol.onConnect()

Variants are:

  • New connection rate. When new connections are established at a rate exceeding configured quota per second, all new connections will get a 503 boot.
  • Number of pending sessions. When number of protocol instances that have not yet established a session (websocket connected, but not authenticated, session = None) exceeds configured threshold - reject new
  • CPU utilization. When worker’s CPU utilization exceeds configured limit, new connections are rejected to prevent ruining it for everyone.
    Clients can then distinguish between “server busy” and network failures, and apply some incremental backoff strategy, or follow router’s advice to reconnect after specified timeout.