Issues with forgotten endpoints?

#1

Are there any known issues with crossbar forgetting registered URIs?

We are just now starting to debug this but i thought I'd ask to see if it's a known issue

What's happening, is for some unknown reason at apparently random times URIs stop responding.

It Started on version 17.2.1, we updated to 17.5.1 and still notice the problem.

The current theory is something to do with mismatched versions, we were using an old Autobahn version 0.10.1

We are updating all services to the latest 17.5.2 hoping that fixes the problem

0 Likes

#2

I haven’t run into this issue although I have had components timeout on their connection to the router due to a different application on the same system behaving badly.

Do your Crossbar logs contain anything useful?

0 Likes

#3

Since URIs are probably registered by a client … maybe your clients are getting disconnected from Crossbar for some reason? Have you looked into network issues between the client and crossbar? Do you see any logs in crossbar about clients being disconnected? If not being fully disconnected, could there be a network routing or firewall problem that simply blocks packets for some reason?

You should get/give more info. After a URI stops working, does it suddenly start working again without intervention from you or once broken it stays broken? Are you using any of the microservices queuing features?

– Dante

0 Likes

#4

Hi Dante.

No, the clients are still running and they are not receiving any close event from Crossbar. So the client still thinks it’s connected to Crossbar but Crossbar no longer has the endpoint registered in it’s list. It’s really weird.

We’ve only just started collecting logs in a data store (there is a lot of noise in the logs due to the health checks we perform on Crossbar). Haven’t had a chance to analyse that yet.

Regards

Andrew Eddie

···

On Friday, 30 June 2017 11:59:50 UTC+10, Dante Lorenso wrote:

Since URIs are probably registered by a client … maybe your clients are getting disconnected from Crossbar for some reason? Have you looked into network issues between the client and crossbar? Do you see any logs in crossbar about clients being disconnected? If not being fully disconnected, could there be a network routing or firewall problem that simply blocks packets for some reason?

You should get/give more info. After a URI stops working, does it suddenly start working again without intervention from you or once broken it stays broken? Are you using any of the microservices queuing features?

– Dante

0 Likes

#5

Hi Greg,

Are there any known issues with crossbar forgetting registered URIs?

I haven't seen such thing.

CB forgetting without some trigger is highly unlikely - removing a callee from a registration is an _active_ process.

Something must be triggering this.

I'd setup a subscriber component listing on the respective WAMP meta events for callees being added/removed to/from registrations, and logging that to a separate file.

This is light weight, you could run that in prod for some time.

http://crossbar.io/docs/Registration-Meta-Events-and-Procedures/#events

···

Am 30.05.2017 um 20:18 schrieb Greg Keys:

---

You are running auto-ping/pong from CB side?

AB JS in browser doesn't have AB inititated heartbeating (lack of API in browsers), but AB JS on node _could_ have that ability.

We are just now starting to debug this but i thought I'd ask to see if it's a known issue

What's happening, is for some unknown reason at apparently random times URIs stop responding.

The callees are not AB JS (in browser), but AB Py components in data center I assume?

And the callers are AB JS (in browser), right?

It Started on version 17.2.1, we updated to 17.5.1 and still notice the problem.

The current theory is something to do with mismatched versions, we were using an old Autobahn version 0.10

We are updating all services to the latest 17.5.2 hoping that fixes the problem

In general, it's a good idea to run on synchronized versions, eg CB and all ABs on the same version.

We have moved the version numbering of all ABs and CB to calendar now, so it is easy to see how far versions differs in time of release.

Cheers,
/Tobias

0 Likes

#6

Update: Someone I work with had something that sounds like this issue but it was quite odd…

He was telling me that while he could call the given end-point from our interactive REPLComponent tool (Something I hope to clean up, improve and release OSS because it’s very useful!) but that our front-end JS code was failing on the same end-point.

Anyways, what we think happened is that his code was entering a giant for-loop and he’d forgotten to do any asyncio.sleep calls in that loop which was in turn starving the event loop and cause the WS ping from the component to the router to timeout and thus disconnecting the component leading to the “No callee registered error”

I’m not 100% certain this is what happened (As the ability of the REPLComponent to call makes no sense, unless he was mistaken about that) but it seems at least possible…

0 Likes