Dead connection detection issue?

#1

Good day and happy new year to everyone.

I have a very simple WAMP program with on_join doing the following:

@component.on_join

async def on_join(session, details):

    logger.info("Entering on_join")

    options = PublishOptions(exclude_me=False, acknowledge=True)

    logger.info("Subscribed")

    await session.subscribe(printer, "news")

    while True:

        logger.info("Sleeping")

        await asyncio.sleep(60)

        logger.info("Publishing")

        await session.publish("news", "Fake news", options=options)

My crossbar runs behind load balancer (GCP) that destroys WebSocket connections after 30 seconds, so I would expect Autobahn to detect connection termination and reconnect - actually, as per my LB configuration the program should never reach the last line.

This is what happens, but only once. Is this a bug here? I’m using autobahn 18.12.1 and crossbar docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

2019-01-05T00:00:44.912: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:00:46.075: main INFO ## Entering on_join

2019-01-05T00:00:46.076: main INFO ## Subscribed

2019-01-05T00:00:46.321: main INFO ## Sleeping

2019-01-05T00:01:15.640: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:01:15.642: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:01:17.806: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:01:19.173: main INFO ## Entering on_join

2019-01-05T00:01:19.174: main INFO ## Subscribed

2019-01-05T00:01:19.393: main INFO ## Sleeping

Note the expected 30 second between “Sleeping” and session leaving. And then connection is re-established and my on_join() is re-entered as the last log line shows - all good for the first time, however…

However then the program misbehaves - the asyncio.sleep(60) line is somehow interrupted after ~20-30 seconds and program goes on to the next line to publish message where it hits autobahn.wamp.exception.TransportLost:

2019-01-05T00:01:19.173: main INFO ## Entering on_join

2019-01-05T00:01:19.174: main INFO ## Subscribed

2019-01-05T00:01:19.393: main INFO ## Sleeping

2019-01-05T00:01:46.329: main INFO ## Publishing # Why does it break from sleep? And why after just 27 seconds?

2019-01-05T00:01:46.331: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘join’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:46.332: autobahn.wamp.protocol.BaseSession ERROR ## While firing onJoin: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:46.333: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘ready’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

[Previous line repeated 2 more times]

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:48.665: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:01:48.666: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:01:51.736: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:01:52.877: main INFO ## Entering on_join 5

2019-01-05T00:01:52.878: main INFO ## Subscribed

2019-01-05T00:01:53.094: main INFO ## Sleeping

2019-01-05T00:02:19.402: main INFO ## Publishing # Another break from sleep. Same tracebacks.

And this how it continues to run. Sometimes the errors order is a bit different:

2019-01-05T00:07:15.221: main INFO ## Entering on_join

2019-01-05T00:07:15.222: main INFO ## Subscribed

2019-01-05T00:07:15.528: main INFO ## Sleeping

2019-01-05T00:07:44.712: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:07:44.714: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:08:15.561: main INFO ## Publishing

2019-01-05T00:08:15.563: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘join’: Traceback (most recent call last):

… (Same tracebacks as above)

0 Likes

#2

yeah, network intermediaries (like load balancers, NATs, reverse proxies, …) will often time out idle websocket connections. idle = not sending/receiving anything (not a single octet) for some amount of time.

try adding these settings to your crossbar node: https://crossbar.io/docs/WebSocket-Options/#production-settings

(the websocket “auto_ping_*” parameters)

that config will make crossbar do regular heartbeating (every 10s), and that should prevent the timeout of the connection

···

note: more details come into consideration when on mobile networks. eg above settings will prevent timeout there too, but the mobile endpoint might go into sleep states still, which will introduce a delay eg receiving pubsub events. if you want to keep the mobile endpoint in “snappy state”, you might need to increase the

auto_ping_size

in my experiments, I need a couple of kb/s to keep the endpoint in snappy state. which of course has the tradeoff of energy consumption.

anyways: this is now really advanced stuff … if you care about this (too), check out:

also: crossbar has a quite advanced websocket ping/pong mechanism

https://github.com/crossbario/autobahn-python/issues/642

reason: it is non-trivial to do that on 100k’s of connections. and mozilla being a user of autobahn websocket for notifications, we had to solve that;) they run north 100m connections. anyways, this is advanced stuff, hope it doesn’t confuse too much


note: this is server (crossbar) initiated ping/pong, should be enough for your case. however, for quick detection of loss of connection (TCP over WAN), more is needed: client initiated ping pong as well. we have that too (in autobahn python), but the status of the feature depends:

autobahn python runs on any combination of: Twisted vs asyncio and “old API” (ApplicationRunner) vs “new API” (Component based)

App Runner on Twisted has client initiated ping/pong with currently hard-coded (but practical/production) parameter settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/twisted/wamp.py#L282

the other 3 variants: lagging behind (I think). I mostly deal with the twisted/apprunner still these days (as that is what crossbar uses internally … historically that variant was the first)

I figure you want asyncio + component: which is definitely “the way to go” forward for a WAMP only Py3 app component (no need for Twisted when you dont need all the bells and whistles that Twisted has vs asyncio)

Am Freitag, 4. Januar 2019 14:19:18 UTC+1 schrieb Zaar Hai:

Good day and happy new year to everyone.

I have a very simple WAMP program with on_join doing the following:

@component.on_join
async def on_join(session, details):
    [logger.info](http://logger.info)("Entering on_join")
    options = PublishOptions(exclude_me=False, acknowledge=True)
    [logger.info](http://logger.info)("Subscribed")
    await session.subscribe(printer, "news")
    while True:
        [logger.info](http://logger.info)("Sleeping")
        await asyncio.sleep(60)
        [logger.info](http://logger.info)("Publishing")
        await session.publish("news", "Fake news", options=options)

My crossbar runs behind load balancer (GCP) that destroys WebSocket connections after 30 seconds, so I would expect Autobahn to detect connection termination and reconnect - actually, as per my LB configuration the program should never reach the last line.

This is what happens, but only once. Is this a bug here? I’m using autobahn 18.12.1 and crossbar docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

2019-01-05T00:00:44.912: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:00:46.075: main INFO ## Entering on_join

2019-01-05T00:00:46.076: main INFO ## Subscribed

2019-01-05T00:00:46.321: main INFO ## Sleeping

2019-01-05T00:01:15.640: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:01:15.642: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:01:17.806: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:01:19.173: main INFO ## Entering on_join

2019-01-05T00:01:19.174: main INFO ## Subscribed

2019-01-05T00:01:19.393: main INFO ## Sleeping

Note the expected 30 second between “Sleeping” and session leaving. And then connection is re-established and my on_join() is re-entered as the last log line shows - all good for the first time, however…

However then the program misbehaves - the asyncio.sleep(60) line is somehow interrupted after ~20-30 seconds and program goes on to the next line to publish message where it hits autobahn.wamp.exception.TransportLost:

2019-01-05T00:01:19.173: main INFO ## Entering on_join

2019-01-05T00:01:19.174: main INFO ## Subscribed

2019-01-05T00:01:19.393: main INFO ## Sleeping

2019-01-05T00:01:46.329: main INFO ## Publishing # Why does it break from sleep? And why after just 27 seconds?

2019-01-05T00:01:46.331: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘join’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:46.332: autobahn.wamp.protocol.BaseSession ERROR ## While firing onJoin: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:46.333: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘ready’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

[Previous line repeated 2 more times]

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:48.665: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:01:48.666: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:01:51.736: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:01:52.877: main INFO ## Entering on_join 5

2019-01-05T00:01:52.878: main INFO ## Subscribed

2019-01-05T00:01:53.094: main INFO ## Sleeping

2019-01-05T00:02:19.402: main INFO ## Publishing # Another break from sleep. Same tracebacks.

And this how it continues to run. Sometimes the errors order is a bit different:

2019-01-05T00:07:15.221: main INFO ## Entering on_join

2019-01-05T00:07:15.222: main INFO ## Subscribed

2019-01-05T00:07:15.528: main INFO ## Sleeping

2019-01-05T00:07:44.712: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:07:44.714: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:08:15.561: main INFO ## Publishing

2019-01-05T00:08:15.563: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘join’: Traceback (most recent call last):

… (Same tracebacks as above)

0 Likes

#3

Hi Tobias,

Thanks for all the details.

I already had production-settings in place, with ping pong enabled as you advise. The thing with GCP Load Balancer is that they kill websockets connections regularly regardless of activity (yes, sounds weird, but it is true). Every 30s by default - it’s configurable but for now I prefer to keep it short to ease debugging.

The problem that I experience is a bit different (IMHO) - autobahn detects broken connection on the background, but only for the first time. Something breaks after reconnect - on 2nd/3rd/etc. iteration asyncio.sleep(60) just miraculously finishes early after just 20-30 seconds (the lines are highlighted in original post).

I happens very consistently - first one works fine, but other don’t.

Do you see a problem here?

Again, I’ll sum it up in human words:

What happens on first connection (good):

  1. on_join is paused on asyncio.sleep(60)

  2. network issue detected

  3. on_join gets rightfully cancelled

  4. network connection re-established

  5. on_join called to start over
    After the above reconnect, the following happens (strange)

  6. on_join is paused on asyncio.sleep(60)

  7. asyncio.sleep() finishes prematurely

  8. session.publish is called (next line) and raises since connection is dead

  9. connection is reestablished

  10. on_join is rerun
    While it works, logs noise aside, I think there is a bug here with on_join cancellation. Without looking at the code, I can speculate that you cancel it properly first time, but not the 2nd/3rd/etc. time - somehow causing it to continue. Does it make sense to you?

Client side pings are surely interesting and looking at your link, I see that asyncio/wamp has similar hard-coded transport settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/asyncio/wamp.py#L223. Is there a quick way to check whether those are enabled indeed (on client)?

···

On Friday, 11 January 2019 19:36:26 UTC+11, tobia...@gmail.com wrote:

yeah, network intermediaries (like load balancers, NATs, reverse proxies, …) will often time out idle websocket connections. idle = not sending/receiving anything (not a single octet) for some amount of time.

try adding these settings to your crossbar node: https://crossbar.io/docs/WebSocket-Options/#production-settings

(the websocket “auto_ping_*” parameters)

that config will make crossbar do regular heartbeating (every 10s), and that should prevent the timeout of the connection


note: more details come into consideration when on mobile networks. eg above settings will prevent timeout there too, but the mobile endpoint might go into sleep states still, which will introduce a delay eg receiving pubsub events. if you want to keep the mobile endpoint in “snappy state”, you might need to increase the

auto_ping_size

in my experiments, I need a couple of kb/s to keep the endpoint in snappy state. which of course has the tradeoff of energy consumption.

anyways: this is now really advanced stuff … if you care about this (too), check out:

also: crossbar has a quite advanced websocket ping/pong mechanism

https://github.com/crossbario/autobahn-python/issues/642

reason: it is non-trivial to do that on 100k’s of connections. and mozilla being a user of autobahn websocket for notifications, we had to solve that;) they run north 100m connections. anyways, this is advanced stuff, hope it doesn’t confuse too much


note: this is server (crossbar) initiated ping/pong, should be enough for your case. however, for quick detection of loss of connection (TCP over WAN), more is needed: client initiated ping pong as well. we have that too (in autobahn python), but the status of the feature depends:

autobahn python runs on any combination of: Twisted vs asyncio and “old API” (ApplicationRunner) vs “new API” (Component based)

App Runner on Twisted has client initiated ping/pong with currently hard-coded (but practical/production) parameter settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/twisted/wamp.py#L282

the other 3 variants: lagging behind (I think). I mostly deal with the twisted/apprunner still these days (as that is what crossbar uses internally … historically that variant was the first)

I figure you want asyncio + component: which is definitely “the way to go” forward for a WAMP only Py3 app component (no need for Twisted when you dont need all the bells and whistles that Twisted has vs asyncio)

Am Freitag, 4. Januar 2019 14:19:18 UTC+1 schrieb Zaar Hai:

Good day and happy new year to everyone.

I have a very simple WAMP program with on_join doing the following:

@component.on_join
async def on_join(session, details):
    [logger.info](http://logger.info)("Entering on_join")
    options = PublishOptions(exclude_me=False, acknowledge=True)
    [logger.info](http://logger.info)("Subscribed")
    await session.subscribe(printer, "news")
    while True:
        [logger.info](http://logger.info)("Sleeping")
        await asyncio.sleep(60)
        [logger.info](http://logger.info)("Publishing")
        await session.publish("news", "Fake news", options=options)

My crossbar runs behind load balancer (GCP) that destroys WebSocket connections after 30 seconds, so I would expect Autobahn to detect connection termination and reconnect - actually, as per my LB configuration the program should never reach the last line.

This is what happens, but only once. Is this a bug here? I’m using autobahn 18.12.1 and crossbar docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

2019-01-05T00:00:44.912: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:00:46.075: main INFO ## Entering on_join

2019-01-05T00:00:46.076: main INFO ## Subscribed

2019-01-05T00:00:46.321: main INFO ## Sleeping

2019-01-05T00:01:15.640: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:01:15.642: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:01:17.806: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:01:19.173: main INFO ## Entering on_join

2019-01-05T00:01:19.174: main INFO ## Subscribed

2019-01-05T00:01:19.393: main INFO ## Sleeping

Note the expected 30 second between “Sleeping” and session leaving. And then connection is re-established and my on_join() is re-entered as the last log line shows - all good for the first time, however…

However then the program misbehaves - the asyncio.sleep(60) line is somehow interrupted after ~20-30 seconds and program goes on to the next line to publish message where it hits autobahn.wamp.exception.TransportLost:

2019-01-05T00:01:19.173: main INFO ## Entering on_join

2019-01-05T00:01:19.174: main INFO ## Subscribed

2019-01-05T00:01:19.393: main INFO ## Sleeping

2019-01-05T00:01:46.329: main INFO ## Publishing # Why does it break from sleep? And why after just 27 seconds?

2019-01-05T00:01:46.331: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘join’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:46.332: autobahn.wamp.protocol.BaseSession ERROR ## While firing onJoin: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:46.333: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘ready’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

File “/home/…/lib/python3.7/site-packages/txaio/aio.py”, line 514, in done

res = f.result()

[Previous line repeated 2 more times]

File “/tmp/repro.py”, line 33, in on_join

await session.publish("news", "Fake news", options=options)

File “/home/…/lib/python3.7/site-packages/autobahn/wamp/protocol.py”, line 1278, in publish

raise exception.TransportLost()

autobahn.wamp.exception.TransportLost

2019-01-05T00:01:48.665: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:01:48.666: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:01:51.736: autobahn.asyncio.component.Component INFO ## connecting once using transport type “websocket” over endpoint “tcp”

2019-01-05T00:01:52.877: main INFO ## Entering on_join 5

2019-01-05T00:01:52.878: main INFO ## Subscribed

2019-01-05T00:01:53.094: main INFO ## Sleeping

2019-01-05T00:02:19.402: main INFO ## Publishing # Another break from sleep. Same tracebacks.

And this how it continues to run. Sometimes the errors order is a bit different:

2019-01-05T00:07:15.221: main INFO ## Entering on_join

2019-01-05T00:07:15.222: main INFO ## Subscribed

2019-01-05T00:07:15.528: main INFO ## Sleeping

2019-01-05T00:07:44.712: autobahn.asyncio.component.Component INFO ## session leaving ‘wamp.close.transport_lost’

2019-01-05T00:07:44.714: autobahn.asyncio.component.Component ERROR ## Connection failed: TransportLost: failed to complete connection

2019-01-05T00:08:15.561: main INFO ## Publishing

2019-01-05T00:08:15.563: autobahn.wamp.protocol.BaseSession ERROR ## While notifying ‘join’: Traceback (most recent call last):

… (Same tracebacks as above)

0 Likes

#4

ok, just tested:

crossbar with ping/pong settings:

autobahn (twisted/apprunner):

  I have stopped and restarted crossbar twice while leaving the

client run, and it automatically reconnects correctly (and also
starts the ping/png again)

in the client log, you’ll notice the

  2019-01-11T10:48:23+0100 WebSocketProtocol.onPing(payload=<4

)

lines. this is the client responding to crossbar ping requests

the lines

  2019-01-11T10:48:24+0100 Auto ping/pong: sending ping

auto-ping/pong

  2019-01-11T10:48:24+0100 Expecting ping in 5.0 seconds for

auto-ping/pong

  2019-01-11T10:48:24+0100 Auto ping/pong: received pending pong for

auto-ping/pong

  2019-01-11T10:48:24+0100 WebSocketProtocol.onPong(payload=<4

)

  are the client initiating the ping, and receiving the crossbar

response.

···

https://gist.github.com/oberstet/da507a03c26743f7738ebeb59ece5f7dhttps://gist.github.com/oberstet/396325ebfbd4e08d1542dd278c5fb679

Am 11.01.19 um 10:35 schrieb Zaar Hai:

Hi Tobias,

Thanks for all the details.

      I already had production-settings in place, with ping pong

enabled as you advise. The thing with GCP Load Balancer is
that they kill websockets connections regularly ** regardless
of activity** (yes, sounds weird, but it is true). Every
30s by default - it’s configurable but for now I prefer to
keep it short to ease debugging.

      The problem that I experience is a bit different (IMHO) -

autobahn detects broken connection on the background, but only
for the first time. Something breaks after reconnect - on
2nd/3rd/etc. iteration asyncio.sleep(60) just miraculously
finishes early after just 20-30 seconds (the lines are
highlighted in original post).

      I happens very consistently - first one works fine, but

other don’t.

Do you see a problem here?

Again, I’ll sum it up in human words:

What happens on first connection (good):

  1. on_join is paused on asyncio.sleep(60)
  2. network issue detected
  3. on_join gets rightfully cancelled
  4. network connection re-established
  5. on_join called to start over
    After the above reconnect, the following happens
    (strange)
  1. on_join is paused on asyncio.sleep(60)
  2. asyncio.sleep() finishes prematurely
  3.           session.publish is called (next line) and raises since
    

connection is dead

  1. connection is reestablished
  2. on_join is rerun
    While it works, logs noise aside, I think there is a bug
    here with on_join cancellation. Without looking at the code,
    I can speculate that you cancel it properly first time, but
    not the 2nd/3rd/etc. time - somehow causing it to continue.
    Does it make sense to you?
      Client side pings are surely interesting and looking at

your link, I see that asyncio/wamp has similar hard-coded
transport
settings: .
Is there a quick way to check whether those are enabled indeed
(on client)?

      On Friday, 11 January 2019 19:36:26 UTC+11,

wrote:

  You received this message because you are subscribed to the Google

Groups “Autobahn” group.

  To unsubscribe from this group and stop receiving emails from it,

send an email to autobahnws+...@googlegroups.com.

  To post to this group, send email to autob...@googlegroups.com.

  To view this discussion on the web visit [https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com](https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com?utm_medium=email&utm_source=footer).

  For more options, visit [https://groups.google.com/d/optout](https://groups.google.com/d/optout).

https://github.com/crossbario/autobahn-python/blob/master/autobahn/asyncio/wamp.py#L223tobia...@gmail.com

            yeah, network intermediaries (like load balancers,

NATs, reverse proxies, …) will often time out idle
websocket connections. idle = not sending/receiving
anything (not a single octet) for some amount of time.

try adding these settings to your crossbar node: https://crossbar.io/docs/WebSocket-Options/#production-settings

(the websocket “auto_ping_*” parameters)

            that config will make crossbar do regular

heartbeating (every 10s), and that should prevent the
timeout of the connection


            note: more details come into consideration when on

mobile networks. eg above settings will prevent timeout
there too, but the mobile endpoint might go into sleep
states still, which will introduce a delay eg receiving
pubsub events. if you want to keep the mobile endpoint
in “snappy state”, you might need to increase the

auto_ping_size
            in my experiments, I need a couple of kb/s to keep

the endpoint in snappy state. which of course has the
tradeoff of energy consumption.

            anyways: this is now really advanced stuff .. if you

care about this (too), check out:

            * [https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html](https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html)
            also: crossbar has a quite advanced websocket

ping/pong mechanism

https://github.com/crossbario/autobahn-python/issues/642

            reason: it is non-trivial to do that on 100k's of

connections. and mozilla being a user of autobahn
websocket for notifications, we had to solve that;) they
run north 100m connections. anyways, this is advanced
stuff, hope it doesn’t confuse too much


            note: this is server (crossbar) initiated ping/pong,

should be enough for your case. however, for quick
detection of loss of connection (TCP over WAN), more is
needed: client initiated ping pong as well. we have that
too (in autobahn python), but the status of the feature
depends:

            autobahn python runs on any combination of: Twisted

vs asyncio and “old API” (ApplicationRunner) vs “new
API” (Component based)

            App Runner on Twisted has client initiated ping/pong

with currently hard-coded (but practical/production)
parameter settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/twisted/wamp.py#L282

            the other 3 variants: lagging behind (I think). I

mostly deal with the twisted/apprunner still these days
(as that is what crossbar uses internally …
historically that variant was the first)

            I figure you want asyncio + component: which is

definitely “the way to go” forward for a WAMP only Py3
app component (no need for Twisted when you dont need
all the bells and whistles that Twisted has vs asyncio)

          Am Freitag, 4. Januar 2019 14:19:18 UTC+1 schrieb Zaar

Hai:

Good day and happy new year to everyone.

                I have a very simple WAMP program with on_join

doing the following:

@component.on_join

                        async

def on_join(session, details):

    [logger.info](http://logger.info)                        ("Entering

on_join")

options = PublishOptions(exclude_me= False,
acknowledge=True)

    [logger.info](http://logger.info)("Subscribed")

await session.subscribe(printer, “news”)

while True:

logger.info(“Sleeping”)

await asyncio.sleep(60)

logger.info(“Publishing”)

await session.publish(“news”, “Fake news”,
options=options)

                  My crossbar runs behind load balancer (GCP)

that destroys WebSocket connections after 30
seconds, so I would expect Autobahn to detect
connection termination and reconnect - actually,
as per my LB configuration the program should
never reach the last line.

                  This is what happens, but only once. Is this a

bug here? I’m using autobahn 18.12.1 and crossbar
docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

                      2019-01-05T00:00:44.912:

autobahn.asyncio.component. Component
INFO ## connecting once using transport type
“websocket” over endpoint “tcp”

                      2019-01-05T00:00:46.075:

main INFO ## Entering on_join

                      2019-01-05T00:00:46.076:

main INFO ## Subscribed

2019-01-05T00:00:46 .321:
main INFO ## Sleeping

2019-01-05T00:01:15 .640:
autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:01:15.642:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:01:17.806:

autobahn.asyncio.component. Component
INFO ## connecting once using transport type
“websocket” over endpoint “tcp”

                      2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                      2019-01-05T00:01:19.174:

main INFO ## Subscribed

                      2019-01-05T00:01:19.393:

main INFO ## Sleeping

                  Note the expected 30 second between "Sleeping"

and session leaving. And then connection is
re-established and my on_join() is re-entered as
the last log line shows - all good for the first
time, however…

However then the program misbehaves - ** the
asyncio.sleep(60) line is somehow interrupted
after ~20-30 seconds** and program goes on to
the next line to publish message where it
hits autobahn.wamp.exception.TransportLost:

                    2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                    2019-01-05T00:01:19.174:

main INFO ## Subscribed

2019-01-05T00:01:19 .393:
main INFO ## Sleeping

2019-01-05T00:01:46 .329:
main INFO ## Publishing
#
Why does it break from sleep? And why after
just 27 seconds?

                    2019-01-05T00:01:46.331:

autobahn.wamp.protocol. BaseSession ERROR ##
While notifying ‘join’: Traceback (most recent
call last):

                      File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                        res =

f.result()

                      File

“/tmp/repro.py”, line 33, in on_join

                        await

session.publish(“news”, “Fake news”,
options=options)

                      File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                        raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                    2019-01-05T00:01:46.332:

autobahn.wamp.protocol. BaseSession ERROR ##
While firing onJoin: Traceback (most recent call
last):

                      File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                        res =

f.result()

                      File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                        res =

f.result()

                      File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                        res =

f.result()

                      File

“/tmp/repro.py”, line 33, in on_join

                        await

session.publish(“news”, “Fake news”,
options=options)

                      File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                        raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                    2019-01-05T00:01:46.333:

autobahn.wamp.protocol. BaseSession ERROR ##
While notifying ‘ready’: Traceback (most recent
call last):

                      File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                        res =

f.result()

                      File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                        res =

f.result()

                      File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                        res =

f.result()

                      [Previous

line repeated 2 more times]

                      File

“/tmp/repro.py”, line 33, in on_join

                        await

session.publish(“news”, “Fake news”,
options=options)

                      File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                        raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                    2019-01-05T00:01:48.665:

autobahn.asyncio.component. Component INFO

session leaving ‘wamp.close.transport_lost’

                    2019-01-05T00:01:48.666:

autobahn.asyncio.component. Component ERROR

Connection failed: TransportLost: failed to

complete connection

                    2019-01-05T00:01:51.736:

autobahn.asyncio.component. Component INFO

connecting once using transport type

“websocket” over endpoint “tcp”

                    2019-01-05T00:01:52.877:

main INFO ## Entering on_join
5

                    2019-01-05T00:01:52.878:

main INFO ## Subscribed

2019-01-05T00:01:53 .094:
main INFO ## Sleeping

2019-01-05T00:02:19 .402:
main INFO ## Publishing # Another break from sleep.
Same tracebacks.

                And this how it continues to run. Sometimes the

errors order is a bit different:

                    2019-01-05T00:07:15.221:

main INFO ## Entering on_join

                    2019-01-05T00:07:15.222:

main INFO ## Subscribed

                    2019-01-05T00:07:15.528:

main INFO ## Sleeping

                    2019-01-05T00:07:44.712:

autobahn.asyncio.component. Component INFO

session leaving ‘wamp.close.transport_lost’

                    2019-01-05T00:07:44.714:

autobahn.asyncio.component. Component ERROR

Connection failed: TransportLost: failed to

complete connection

                    2019-01-05T00:08:15.561:

main INFO ## Publishing

                    2019-01-05T00:08:15.563:

autobahn.wamp.protocol. BaseSession ERROR ##
While notifying ‘join’: Traceback (most recent
call last):

                  ... (Same

tracebacks as above)

0 Likes

#5

one more hint: to enable an insane amount of logging on the
crossbar side:

./crossbarfx edge start --loglevel trace

  here you see the incoming client initiated pings, replies, and

also the crossbar initiated pings, and receive replies:

  2019-01-11T11:06:15+0100 [Router      19176

crossbar.router.service.RouterServiceAgent]
RouterServiceAgent.publish(“wamp.session.on_join”) ->
“wamp.session.on_join” on “realm1”

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:25+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol]
WebSocketProtocol.onPing(payload=<4 bytes>)

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol.WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

···

Am 11.01.19 um 11:03 schrieb Tobias
Oberstein:

ok, just tested:

crossbar with ping/pong settings:

autobahn (twisted/apprunner):

    I have stopped and restarted crossbar twice while leaving the

client run, and it automatically reconnects correctly (and also
starts the ping/png again)

in the client log, you’ll notice the

    2019-01-11T10:48:23+0100 WebSocketProtocol.onPing(payload=<4

bytes>)

lines. this is the client responding to crossbar ping requests

the lines

    2019-01-11T10:48:24+0100 Auto ping/pong: sending ping

auto-ping/pong

    2019-01-11T10:48:24+0100 Expecting ping in 5.0 seconds for

auto-ping/pong

    2019-01-11T10:48:24+0100 Auto ping/pong: received pending pong

for auto-ping/pong

    2019-01-11T10:48:24+0100 WebSocketProtocol.onPong(payload=<4

bytes>)

    are the client initiating the ping, and receiving the crossbar

response.


    above is using autobahn/twisted/apprunner: pip install

autobahn[twisted]

    (yeah, I know you want asyncio, but this is to track the issues

to their root cause first - I positively know above works;)


could you maybe try that in your setup and dump logs here?

    also: are you using TLS (secure websocket), and if so

terminating TLS at GCP or at crossbar?

could you test the latter? wss and tls termination at crossbar?

    reason: otherwise the ping/pong messages might not even be

forwarded by GCP!! a websocket intermediary is allowed to do
that (only for websocket ping/pong messages, not regular
websocket messages). it can completely drop ping/pongs - or only
forward some - or create its own

this could be one issue.

    when you terminate TLS at crossbar, and consequently use GCP

only as a layer 4 LB, GCP simply cannot look inside and should
be sad (stop its weird interference)

    if you terminate TLS at GCP, and use GCP at layer 7, the next

thing I would try is doing a manual ping/pong using regular
websocket messages. gcp is NOT allowed to drop those;) that
would suck nevertheless, but we could think about what we can to
mitigate.

    in any case: if you don't have a hard reason for L7 LB, go for

L4 - that is what we do and recommend usually (also for
“security reasons”: you are in control of your TLS keys/certs,
and your traffic is kept confidental right into your VMs).

    anyways: above tests will reveal more information .. pls let me

know the results, and I can comment again

https://gist.github.com/oberstet/da507a03c26743f7738ebeb59ece5f7dhttps://gist.github.com/oberstet/396325ebfbd4e08d1542dd278c5fb679

    Am 11.01.19 um 10:35 schrieb Zaar

Hai:

Hi Tobias,

Thanks for all the details.

        I already had production-settings in place, with ping

pong enabled as you advise. The thing with GCP Load Balancer
is that they kill websockets connections regularly ** regardless
of activity** (yes, sounds weird, but it is true).
Every 30s by default - it’s configurable but for now I
prefer to keep it short to ease debugging.

        The problem that I experience is a bit different (IMHO) -

autobahn detects broken connection on the background, but
only for the first time. Something breaks after reconnect -
on 2nd/3rd/etc. iteration asyncio.sleep(60) just
miraculously finishes early after just 20-30 seconds (the
lines are highlighted in original post).

        I happens very consistently - first one works fine, but

other don’t.

Do you see a problem here?

Again, I’ll sum it up in human words:

What happens on first connection (good):

  1. on_join is paused on asyncio.sleep(60)
  2. network issue detected
  3. on_join gets rightfully cancelled
  4. network connection re-established
  5. on_join called to start over
    After the above reconnect, the following happens
    (strange)
  1. on_join is paused on asyncio.sleep(60)
  2. asyncio.sleep() finishes prematurely
  3.             session.publish is called (next line) and raises since
    

connection is dead

  1. connection is reestablished
  2. on_join is rerun
    While it works, logs noise aside, I think there is a
    bug here with on_join cancellation. Without looking at the
    code, I can speculate that you cancel it properly first
    time, but not the 2nd/3rd/etc. time - somehow causing it
    to continue. Does it make sense to you?
        Client side pings are surely interesting and looking at

your link, I see that asyncio/wamp has similar hard-coded
transport
settings: .
Is there a quick way to check whether those are enabled
indeed (on client)?

        On Friday, 11 January 2019 19:36:26 UTC+11, wrote:

    You received this message because you are subscribed to the

Google Groups “Autobahn” group.

    To unsubscribe from this group and stop receiving emails from

it, send an email to autobahnws+...@googlegroups.com.

    To post to this group, send email to autob...@googlegroups.com.

    To view this discussion on the web visit [https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com](https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com?utm_medium=email&utm_source=footer).

    For more options, visit [https://groups.google.com/d/optout](https://groups.google.com/d/optout).

https://github.com/crossbario/autobahn-python/blob/master/autobahn/asyncio/wamp.py#L223tobia...@gmail.com

              yeah, network intermediaries (like load balancers,

NATs, reverse proxies, …) will often time out idle
websocket connections. idle = not sending/receiving
anything (not a single octet) for some amount of time.

try adding these settings to your crossbar node: https://crossbar.io/docs/WebSocket-Options/#production-settings

(the websocket “auto_ping_*” parameters)

              that config will make crossbar do regular

heartbeating (every 10s), and that should prevent the
timeout of the connection


              note: more details come into consideration when on

mobile networks. eg above settings will prevent
timeout there too, but the mobile endpoint might go
into sleep states still, which will introduce a delay
eg receiving pubsub events. if you want to keep the
mobile endpoint in “snappy state”, you might need to
increase the

auto_ping_size
              in my experiments, I need a couple of kb/s to keep

the endpoint in snappy state. which of course has the
tradeoff of energy consumption.

              anyways: this is now really advanced stuff .. if

you care about this (too), check out:

              * [https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html](https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html)
              also: crossbar has a quite advanced websocket

ping/pong mechanism

https://github.com/crossbario/autobahn-python/issues/642

              reason: it is non-trivial to do that on 100k's of

connections. and mozilla being a user of autobahn
websocket for notifications, we had to solve that;)
they run north 100m connections. anyways, this is
advanced stuff, hope it doesn’t confuse too much


              note: this is server (crossbar) initiated

ping/pong, should be enough for your case. however,
for quick detection of loss of connection (TCP over
WAN), more is needed: client initiated ping pong as
well. we have that too (in autobahn python), but the
status of the feature depends:

              autobahn python runs on any combination of: Twisted

vs asyncio and “old API” (ApplicationRunner) vs “new
API” (Component based)

              App Runner on Twisted has client initiated

ping/pong with currently hard-coded (but
practical/production) parameter settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/twisted/wamp.py#L282

              the other 3 variants: lagging behind (I think). I

mostly deal with the twisted/apprunner still these
days (as that is what crossbar uses internally …
historically that variant was the first)

              I figure you want asyncio + component: which is

definitely “the way to go” forward for a WAMP only Py3
app component (no need for Twisted when you dont need
all the bells and whistles that Twisted has vs
asyncio)

            Am Freitag, 4. Januar 2019 14:19:18 UTC+1 schrieb Zaar

Hai:

                Good day and happy new year to

everyone.

                  I have a very simple WAMP program with on_join

doing the following:

@component.on_join

                          async

def on_join(session, details):

    [logger.info](http://logger.info)                          ("Entering

on_join")

options = PublishOptions(exclude_me= False,
acknowledge=True)

    [logger.info](http://logger.info)("Subscribed")

await session.subscribe(printer, “news”)

while True:

logger.info(“Sleeping”)

await asyncio.sleep(60)

logger.info(“Publishing”)

await session.publish(“news”, “Fake news”,
options=options)

                    My crossbar runs behind load balancer (GCP)

that destroys WebSocket connections after 30
seconds, so I would expect Autobahn to detect
connection termination and reconnect - actually,
as per my LB configuration the program should
never reach the last line.

                    This is what happens, but only once. Is this

a bug here? I’m using autobahn 18.12.1 and
crossbar docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

                        2019-01-05T00:00:44.912:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:00:46.075:

main INFO ## Entering on_join

                        2019-01-05T00:00:46.076:

main INFO ## Subscribed

2019-01-05T00:00:46 .321:
main INFO ## Sleeping

2019-01-05T00:01:15 .640:
autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                        2019-01-05T00:01:15.642:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                        2019-01-05T00:01:17.806:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                        2019-01-05T00:01:19.174:

main INFO ## Subscribed

                        2019-01-05T00:01:19.393:

main INFO ## Sleeping

                    Note the expected 30 second between

“Sleeping” and session leaving. And then
connection is re-established and my on_join() is
re-entered as the last log line shows - all good
for the first time, however…

However then the program misbehaves - ** the
asyncio.sleep(60) line is somehow interrupted
after ~20-30 seconds** and program goes on
to the next line to publish message where it
hits autobahn.wamp.exception.TransportLost:

                      2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                      2019-01-05T00:01:19.174:

main INFO ## Subscribed

2019-01-05T00:01:19 .393:
main INFO ## Sleeping

2019-01-05T00:01:46 .329:
main INFO ## Publishing
#
Why does it break from sleep? And why after
just 27 seconds?

                      2019-01-05T00:01:46.331:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.332:

autobahn.wamp.protocol. BaseSession ERROR

While firing onJoin: Traceback (most recent

call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.333:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘ready’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

[Previous line repeated 2 more times]

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:48.665:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:01:48.666:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:01:51.736:

autobahn.asyncio.component. Component
INFO ## connecting once using transport type
“websocket” over endpoint “tcp”

                      2019-01-05T00:01:52.877:

main INFO ## Entering on_join
5

                      2019-01-05T00:01:52.878:

main INFO ## Subscribed

2019-01-05T00:01:53 .094:
main INFO ## Sleeping

2019-01-05T00:02:19 .402:
main INFO ## Publishing # Another break from
sleep. Same tracebacks.

                  And this how it continues to run. Sometimes the

errors order is a bit different:

                      2019-01-05T00:07:15.221:

main INFO ## Entering on_join

                      2019-01-05T00:07:15.222:

main INFO ## Subscribed

                      2019-01-05T00:07:15.528:

main INFO ## Sleeping

                      2019-01-05T00:07:44.712:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:07:44.714:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:08:15.561:

main INFO ## Publishing

                      2019-01-05T00:08:15.563:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                    ... (Same

tracebacks as above)

0 Likes

#6

Hi Tobias,

Sorry for the delay - our last discussion ran deep into the night here in Down Under.

I did the tests as you suggested (L7 LB). Outcomes:

Here is relevant excerpt from twisted version:

2019-01-14T23:52:35+1100 openHandshakeTimeoutCall.cancel

2019-01-14T23:52:36+1100 Entering on_join

2019-01-14T23:52:36+1100 Subscribed

2019-01-14T23:52:36+1100 Sleeping <- Didn’t continue beyond sleeping - GOOD!

2019-01-14T23:52:45+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:52:55+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:05+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:05+1100 Connection to/from tcp4:ip.is.hidden:443 was lost in a non-clean fashion: Connection to the other side was lost in a non-clean fashion: Connection lost.

2019-01-14T23:53:05+1100 _connectionLost: [Failure instance: Traceback (failure with no frames): <class ‘twisted.internet.error.ConnectionLost’>: Connection to the other side was lost in a non-clean fashion: Connection lost.

]

2019-01-14T23:53:05+1100 WAMP-over-WebSocket transport lost: wasClean=False, code=1006, reason=“connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake)”

2019-01-14T23:53:05+1100 session leaving ‘wamp.close.transport_lost’

2019-01-14T23:53:05+1100 component failed: ApplicationError: ApplicationError(error=<wamp.close.transport_lost>, args=[], kwargs={}, enc_algo=None)

2019-01-14T23:53:05+1100 Traceback (most recent call last):

Failure: autobahn.wamp.exception.ApplicationError: ApplicationError(error=<wamp.close.transport_lost>, args=[], kwargs={}, enc_algo=None)

2019-01-14T23:53:05+1100 wamp.close.transport_lost:

2019-01-14T23:53:05+1100 session on_disconnect: was_clean=False

2019-01-14T23:53:05+1100 Stopping factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f77224e09b0>

2019-01-14T23:53:05+1100 Entering re-connect loop

2019-01-14T23:53:05+1100 trying transport 0 using connect delay 2.3187428648299617

2019-01-14T23:53:07+1100 connecting once using transport type “websocket” over endpoint “tcp”

2019-01-14T23:53:07+1100 Starting factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f77224f20b8>

2019-01-14T23:53:08+1100 openHandshakeTimeoutCall.cancel

2019-01-14T23:53:08+1100 Entering on_join

2019-01-14T23:53:08+1100 Subscribed

2019-01-14T23:53:09+1100 Sleeping

2019-01-14T23:53:18+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:28+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:36+1100 Publishing <- Continued beyond sleeping

2019-01-14T23:53:36+1100 While notifying ‘join’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/twisted/internet/defer.py”, line 654, in _runCallbacks

current.result = callback(current.result, *args, **kw)
···

On Friday, 11 January 2019 21:08:26 UTC+11, Tobias Oberstein wrote:

  one more hint: to enable an insane amount of logging on the

crossbar side:

./crossbarfx edge start --loglevel trace

  here you see the incoming client initiated pings, replies, and

also the crossbar initiated pings, and receive replies:

  2019-01-11T11:06:15+0100 [Router      19176

crossbar.router.service. RouterServiceAgent]
RouterServiceAgent.publish(" wamp.session.on_join") ->
“wamp.session.on_join” on “realm1”

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:25+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPing(payload=<4 bytes>)

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  Am 11.01.19 um 11:03 schrieb Tobias > Oberstein:

ok, just tested:

crossbar with ping/pong settings: https://gist.github.com/oberstet/da507a03c26743f7738ebeb59ece5f7d

autobahn (twisted/apprunner): https://gist.github.com/oberstet/396325ebfbd4e08d1542dd278c5fb679

    I have stopped and restarted crossbar twice while leaving the

client run, and it automatically reconnects correctly (and also
starts the ping/png again)

in the client log, you’ll notice the

2019-01-11T10:48:23+0100 WebSocketProtocol.onPing( payload=<4

)

lines. this is the client responding to crossbar ping requests

the lines

    2019-01-11T10:48:24+0100 Auto ping/pong: sending ping

auto-ping/pong

    2019-01-11T10:48:24+0100 Expecting ping in 5.0 seconds for

auto-ping/pong

    2019-01-11T10:48:24+0100 Auto ping/pong: received pending pong

for auto-ping/pong

    2019-01-11T10:48:24+0100 WebSocketProtocol.onPong(        payload=<4

)

    are the client initiating the ping, and receiving the crossbar

response.


    above is using autobahn/twisted/apprunner: pip install

autobahn[twisted]

    (yeah, I know you want asyncio, but this is to track the issues

to their root cause first - I positively know above works;)


could you maybe try that in your setup and dump logs here?

    also: are you using TLS (secure websocket), and if so

terminating TLS at GCP or at crossbar?

could you test the latter? wss and tls termination at crossbar?

    reason: otherwise the ping/pong messages might not even be

forwarded by GCP!! a websocket intermediary is allowed to do
that (only for websocket ping/pong messages, not regular
websocket messages). it can completely drop ping/pongs - or only
forward some - or create its own

this could be one issue.

    when you terminate TLS at crossbar, and consequently use GCP

only as a layer 4 LB, GCP simply cannot look inside and should
be sad (stop its weird interference)

    if you terminate TLS at GCP, and use GCP at layer 7, the next

thing I would try is doing a manual ping/pong using regular
websocket messages. gcp is NOT allowed to drop those;) that
would suck nevertheless, but we could think about what we can to
mitigate.

    in any case: if you don't have a hard reason for L7 LB, go for

L4 - that is what we do and recommend usually (also for
“security reasons”: you are in control of your TLS keys/certs,
and your traffic is kept confidental right into your VMs).

    anyways: above tests will reveal more information .. pls let me

know the results, and I can comment again

    Am 11.01.19 um 10:35 schrieb Zaar > > Hai:

Hi Tobias,

Thanks for all the details.

        I already had production-settings in place, with ping

pong enabled as you advise. The thing with GCP Load Balancer
is that they kill websockets connections regularly ** regardless
of activity** (yes, sounds weird, but it is true).
Every 30s by default - it’s configurable but for now I
prefer to keep it short to ease debugging.

        The problem that I experience is a bit different (IMHO) -

autobahn detects broken connection on the background, but
only for the first time. Something breaks after reconnect -
on 2nd/3rd/etc. iteration asyncio.sleep(60) just
miraculously finishes early after just 20-30 seconds (the
lines are highlighted in original post).

        I happens very consistently - first one works fine, but

other don’t.

Do you see a problem here?

Again, I’ll sum it up in human words:

What happens on first connection (good):

  1. on_join is paused on asyncio.sleep(60)
  2. network issue detected
  3. on_join gets rightfully cancelled
  4. network connection re-established
  5. on_join called to start over
    After the above reconnect, the following happens
    (strange)
  1. on_join is paused on asyncio.sleep(60)
  2. asyncio.sleep() finishes prematurely
  3.             session.publish is called (next line) and raises since
    

connection is dead

  1. connection is reestablished
  2. on_join is rerun
    While it works, logs noise aside, I think there is a
    bug here with on_join cancellation. Without looking at the
    code, I can speculate that you cancel it properly first
    time, but not the 2nd/3rd/etc. time - somehow causing it
    to continue. Does it make sense to you?
        Client side pings are surely interesting and looking at

your link, I see that asyncio/wamp has similar hard-coded
transport
settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/asyncio/wamp.py#L223 .
Is there a quick way to check whether those are enabled
indeed (on client)?

        On Friday, 11 January 2019 19:36:26 UTC+11, tobia...@gmail.com > > >             wrote:
              yeah, network intermediaries (like load balancers,

NATs, reverse proxies, …) will often time out idle
websocket connections. idle = not sending/receiving
anything (not a single octet) for some amount of time.

try adding these settings to your crossbar node: https://crossbar.io/docs/WebSocket-Options/#production-settings

(the websocket “auto_ping_*” parameters)

              that config will make crossbar do regular

heartbeating (every 10s), and that should prevent the
timeout of the connection


              note: more details come into consideration when on

mobile networks. eg above settings will prevent
timeout there too, but the mobile endpoint might go
into sleep states still, which will introduce a delay
eg receiving pubsub events. if you want to keep the
mobile endpoint in “snappy state”, you might need to
increase the

auto_ping_size
              in my experiments, I need a couple of kb/s to keep

the endpoint in snappy state. which of course has the
tradeoff of energy consumption.

              anyways: this is now really advanced stuff .. if

you care about this (too), check out:

              * [https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html](https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html)
              also: crossbar has a quite advanced websocket

ping/pong mechanism

https://github.com/crossbario/autobahn-python/issues/642

              reason: it is non-trivial to do that on 100k's of

connections. and mozilla being a user of autobahn
websocket for notifications, we had to solve that;)
they run north 100m connections. anyways, this is
advanced stuff, hope it doesn’t confuse too much


              note: this is server (crossbar) initiated

ping/pong, should be enough for your case. however,
for quick detection of loss of connection (TCP over
WAN), more is needed: client initiated ping pong as
well. we have that too (in autobahn python), but the
status of the feature depends:

              autobahn python runs on any combination of: Twisted

vs asyncio and “old API” (ApplicationRunner) vs “new
API” (Component based)

              App Runner on Twisted has client initiated

ping/pong with currently hard-coded (but
practical/production) parameter settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/twisted/wamp.py#L282

              the other 3 variants: lagging behind (I think). I

mostly deal with the twisted/apprunner still these
days (as that is what crossbar uses internally …
historically that variant was the first)

              I figure you want asyncio + component: which is

definitely “the way to go” forward for a WAMP only Py3
app component (no need for Twisted when you dont need
all the bells and whistles that Twisted has vs
asyncio)

            Am Freitag, 4. Januar 2019 14:19:18 UTC+1 schrieb Zaar > > > > Hai:
                Good day and happy new year to

everyone.

                  I have a very simple WAMP program with on_join

doing the following:

@component.on_join

                          async

def on_join(session, details):

    [logger.info](http://logger.info)                          ("Entering

on_join")

options = PublishOptions(exclude_me= False,
acknowledge=True)

    [logger.info](http://logger.info)("Subscribed")

await session.subscribe(printer, “news”)

while True:

logger.info(“Sleeping”)

await asyncio.sleep(60)

logger.info(“Publishing”)

await session.publish(“news”, “Fake news”,
options=options)

                    My crossbar runs behind load balancer (GCP)

that destroys WebSocket connections after 30
seconds, so I would expect Autobahn to detect
connection termination and reconnect - actually,
as per my LB configuration the program should
never reach the last line.

                    This is what happens, but only once. Is this

a bug here? I’m using autobahn 18.12.1 and
crossbar docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

                        2019-01-05T00:00:44.912:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:00:46.075:

main INFO ## Entering on_join

                        2019-01-05T00:00:46.076:

main INFO ## Subscribed

2019-01-05T00:00:46 .321:
main INFO ## Sleeping

2019-01-05T00:01:15 .640:
autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                        2019-01-05T00:01:15.642:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                        2019-01-05T00:01:17.806:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                        2019-01-05T00:01:19.174:

main INFO ## Subscribed

                        2019-01-05T00:01:19.393:

main INFO ## Sleeping

                    Note the expected 30 second between

“Sleeping” and session leaving. And then
connection is re-established and my on_join() is
re-entered as the last log line shows - all good
for the first time, however…

However then the program misbehaves - ** the
asyncio.sleep(60) line is somehow interrupted
after ~20-30 seconds** and program goes on
to the next line to publish message where it
hits autobahn.wamp.exception.TransportLost:

                      2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                      2019-01-05T00:01:19.174:

main INFO ## Subscribed

2019-01-05T00:01:19 .393:
main INFO ## Sleeping

2019-01-05T00:01:46 .329:
main INFO ## Publishing
#
Why does it break from sleep? And why after
just 27 seconds?

                      2019-01-05T00:01:46.331:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.332:

autobahn.wamp.protocol. BaseSession ERROR

While firing onJoin: Traceback (most recent

call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.333:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘ready’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

[Previous line repeated 2 more times]

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:48.665:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:01:48.666:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:01:51.736:

autobahn.asyncio.component. Component
INFO ## connecting once using transport type
“websocket” over endpoint “tcp”

                      2019-01-05T00:01:52.877:

main INFO ## Entering on_join
5

                      2019-01-05T00:01:52.878:

main INFO ## Subscribed

2019-01-05T00:01:53 .094:
main INFO ## Sleeping

2019-01-05T00:02:19 .402:
main INFO ## Publishing # Another break from
sleep. Same tracebacks.

                  And this how it continues to run. Sometimes the

errors order is a bit different:

                      2019-01-05T00:07:15.221:

main INFO ## Entering on_join

                      2019-01-05T00:07:15.222:

main INFO ## Subscribed

                      2019-01-05T00:07:15.528:

main INFO ## Sleeping

                      2019-01-05T00:07:44.712:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:07:44.714:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:08:15.561:

main INFO ## Publishing

                      2019-01-05T00:08:15.563:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                    ... (Same

tracebacks as above)

    You received this message because you are subscribed to the

Google Groups “Autobahn” group.

    To unsubscribe from this group and stop receiving emails from

it, send an email to autobahnws+...@googlegroups.com.

    To post to this group, send email to auto...@googlegroups.com.

    To view this discussion on the web visit [https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com](https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com?utm_medium=email&utm_source=footer).

    For more options, visit [https://groups.google.com/d/optout](https://groups.google.com/d/optout).
0 Likes

#7

P.S. Can’t easily test with L4 LB unfortunately.
Today’s K8s tooling made it all too convenient - K8s provisions GCP LB, then kube-lego takes care to provision LetsEncrypt certificates and updates the LB with them regularly. Couple of lines of k8s manifest and you have your HTTPS load balancer.

Btw, do you know of any attempts to integrate Crossbar with LetsEncrypt (e.g. through their certbot)?

Thank you,

Zaar

···

On Tuesday, 15 January 2019 00:22:30 UTC+11, Zaar Hai wrote:

Hi Tobias,

Sorry for the delay - our last discussion ran deep into the night here in Down Under.

I did the tests as you suggested (L7 LB). Outcomes:

Here is relevant excerpt from twisted version:

2019-01-14T23:52:35+1100 openHandshakeTimeoutCall.cancel

2019-01-14T23:52:36+1100 Entering on_join

2019-01-14T23:52:36+1100 Subscribed

2019-01-14T23:52:36+1100 Sleeping <- Didn’t continue beyond sleeping - GOOD!

2019-01-14T23:52:45+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:52:55+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:05+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:05+1100 Connection to/from tcp4:ip.is.hidden:443 was lost in a non-clean fashion: Connection to the other side was lost in a non-clean fashion: Connection lost.

2019-01-14T23:53:05+1100 _connectionLost: [Failure instance: Traceback (failure with no frames): <class ‘twisted.internet.error.ConnectionLost’>: Connection to the other side was lost in a non-clean fashion: Connection lost.

]

2019-01-14T23:53:05+1100 WAMP-over-WebSocket transport lost: wasClean=False, code=1006, reason=“connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake)”

2019-01-14T23:53:05+1100 session leaving ‘wamp.close.transport_lost’

2019-01-14T23:53:05+1100 component failed: ApplicationError: ApplicationError(error=<wamp.close.transport_lost>, args=[], kwargs={}, enc_algo=None)

2019-01-14T23:53:05+1100 Traceback (most recent call last):

Failure: autobahn.wamp.exception.ApplicationError: ApplicationError(error=<wamp.close.transport_lost>, args=[], kwargs={}, enc_algo=None)

2019-01-14T23:53:05+1100 wamp.close.transport_lost:

2019-01-14T23:53:05+1100 session on_disconnect: was_clean=False

2019-01-14T23:53:05+1100 Stopping factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f77224e09b0>

2019-01-14T23:53:05+1100 Entering re-connect loop

2019-01-14T23:53:05+1100 trying transport 0 using connect delay 2.3187428648299617

2019-01-14T23:53:07+1100 connecting once using transport type “websocket” over endpoint “tcp”

2019-01-14T23:53:07+1100 Starting factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f77224f20b8>

2019-01-14T23:53:08+1100 openHandshakeTimeoutCall.cancel

2019-01-14T23:53:08+1100 Entering on_join

2019-01-14T23:53:08+1100 Subscribed

2019-01-14T23:53:09+1100 Sleeping

2019-01-14T23:53:18+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:28+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:36+1100 Publishing <- Continued beyond sleeping

2019-01-14T23:53:36+1100 While notifying ‘join’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/twisted/internet/defer.py”, line 654, in _runCallbacks

current.result = callback(current.result, *args, **kw)

On Friday, 11 January 2019 21:08:26 UTC+11, Tobias Oberstein wrote:

  one more hint: to enable an insane amount of logging on the

crossbar side:

./crossbarfx edge start --loglevel trace

  here you see the incoming client initiated pings, replies, and

also the crossbar initiated pings, and receive replies:

  2019-01-11T11:06:15+0100 [Router      19176

crossbar.router.service. RouterServiceAgent]
RouterServiceAgent.publish(" wamp.session.on_join") ->
“wamp.session.on_join” on “realm1”

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:25+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPing(payload=<4 bytes>)

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  Am 11.01.19 um 11:03 schrieb Tobias > > Oberstein:

ok, just tested:

crossbar with ping/pong settings: https://gist.github.com/oberstet/da507a03c26743f7738ebeb59ece5f7d

autobahn (twisted/apprunner): https://gist.github.com/oberstet/396325ebfbd4e08d1542dd278c5fb679

    I have stopped and restarted crossbar twice while leaving the

client run, and it automatically reconnects correctly (and also
starts the ping/png again)

in the client log, you’ll notice the

2019-01-11T10:48:23+0100 WebSocketProtocol.onPing( payload=<4

)

lines. this is the client responding to crossbar ping requests

the lines

    2019-01-11T10:48:24+0100 Auto ping/pong: sending ping

auto-ping/pong

    2019-01-11T10:48:24+0100 Expecting ping in 5.0 seconds for

auto-ping/pong

    2019-01-11T10:48:24+0100 Auto ping/pong: received pending pong

for auto-ping/pong

    2019-01-11T10:48:24+0100 WebSocketProtocol.onPong(        payload=<4

)

    are the client initiating the ping, and receiving the crossbar

response.


    above is using autobahn/twisted/apprunner: pip install

autobahn[twisted]

    (yeah, I know you want asyncio, but this is to track the issues

to their root cause first - I positively know above works;)


could you maybe try that in your setup and dump logs here?

    also: are you using TLS (secure websocket), and if so

terminating TLS at GCP or at crossbar?

could you test the latter? wss and tls termination at crossbar?

    reason: otherwise the ping/pong messages might not even be

forwarded by GCP!! a websocket intermediary is allowed to do
that (only for websocket ping/pong messages, not regular
websocket messages). it can completely drop ping/pongs - or only
forward some - or create its own

this could be one issue.

    when you terminate TLS at crossbar, and consequently use GCP

only as a layer 4 LB, GCP simply cannot look inside and should
be sad (stop its weird interference)

    if you terminate TLS at GCP, and use GCP at layer 7, the next

thing I would try is doing a manual ping/pong using regular
websocket messages. gcp is NOT allowed to drop those;) that
would suck nevertheless, but we could think about what we can to
mitigate.

    in any case: if you don't have a hard reason for L7 LB, go for

L4 - that is what we do and recommend usually (also for
“security reasons”: you are in control of your TLS keys/certs,
and your traffic is kept confidental right into your VMs).

    anyways: above tests will reveal more information .. pls let me

know the results, and I can comment again

    Am 11.01.19 um 10:35 schrieb Zaar > > > Hai:

Hi Tobias,

Thanks for all the details.

        I already had production-settings in place, with ping

pong enabled as you advise. The thing with GCP Load Balancer
is that they kill websockets connections regularly ** regardless
of activity** (yes, sounds weird, but it is true).
Every 30s by default - it’s configurable but for now I
prefer to keep it short to ease debugging.

        The problem that I experience is a bit different (IMHO) -

autobahn detects broken connection on the background, but
only for the first time. Something breaks after reconnect -
on 2nd/3rd/etc. iteration asyncio.sleep(60) just
miraculously finishes early after just 20-30 seconds (the
lines are highlighted in original post).

        I happens very consistently - first one works fine, but

other don’t.

Do you see a problem here?

Again, I’ll sum it up in human words:

What happens on first connection (good):

  1. on_join is paused on asyncio.sleep(60)
  2. network issue detected
  3. on_join gets rightfully cancelled
  4. network connection re-established
  5. on_join called to start over
    After the above reconnect, the following happens
    (strange)
  1. on_join is paused on asyncio.sleep(60)
  2. asyncio.sleep() finishes prematurely
  3.             session.publish is called (next line) and raises since
    

connection is dead

  1. connection is reestablished
  2. on_join is rerun
    While it works, logs noise aside, I think there is a
    bug here with on_join cancellation. Without looking at the
    code, I can speculate that you cancel it properly first
    time, but not the 2nd/3rd/etc. time - somehow causing it
    to continue. Does it make sense to you?
        Client side pings are surely interesting and looking at

your link, I see that asyncio/wamp has similar hard-coded
transport
settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/asyncio/wamp.py#L223 .
Is there a quick way to check whether those are enabled
indeed (on client)?

        On Friday, 11 January 2019 19:36:26 UTC+11, tobia...@gmail.com > > > >             wrote:
              yeah, network intermediaries (like load balancers,

NATs, reverse proxies, …) will often time out idle
websocket connections. idle = not sending/receiving
anything (not a single octet) for some amount of time.

try adding these settings to your crossbar node: https://crossbar.io/docs/WebSocket-Options/#production-settings

(the websocket “auto_ping_*” parameters)

              that config will make crossbar do regular

heartbeating (every 10s), and that should prevent the
timeout of the connection


              note: more details come into consideration when on

mobile networks. eg above settings will prevent
timeout there too, but the mobile endpoint might go
into sleep states still, which will introduce a delay
eg receiving pubsub events. if you want to keep the
mobile endpoint in “snappy state”, you might need to
increase the

auto_ping_size
              in my experiments, I need a couple of kb/s to keep

the endpoint in snappy state. which of course has the
tradeoff of energy consumption.

              anyways: this is now really advanced stuff .. if

you care about this (too), check out:

              * [https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html](https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html)
              also: crossbar has a quite advanced websocket

ping/pong mechanism

https://github.com/crossbario/autobahn-python/issues/642

              reason: it is non-trivial to do that on 100k's of

connections. and mozilla being a user of autobahn
websocket for notifications, we had to solve that;)
they run north 100m connections. anyways, this is
advanced stuff, hope it doesn’t confuse too much


              note: this is server (crossbar) initiated

ping/pong, should be enough for your case. however,
for quick detection of loss of connection (TCP over
WAN), more is needed: client initiated ping pong as
well. we have that too (in autobahn python), but the
status of the feature depends:

              autobahn python runs on any combination of: Twisted

vs asyncio and “old API” (ApplicationRunner) vs “new
API” (Component based)

              App Runner on Twisted has client initiated

ping/pong with currently hard-coded (but
practical/production) parameter settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/twisted/wamp.py#L282

              the other 3 variants: lagging behind (I think). I

mostly deal with the twisted/apprunner still these
days (as that is what crossbar uses internally …
historically that variant was the first)

              I figure you want asyncio + component: which is

definitely “the way to go” forward for a WAMP only Py3
app component (no need for Twisted when you dont need
all the bells and whistles that Twisted has vs
asyncio)

            Am Freitag, 4. Januar 2019 14:19:18 UTC+1 schrieb Zaar > > > > > Hai:
                Good day and happy new year to

everyone.

                  I have a very simple WAMP program with on_join

doing the following:

@component.on_join

                          async

def on_join(session, details):

    [logger.info](http://logger.info)                          ("Entering

on_join")

options = PublishOptions(exclude_me= False,
acknowledge=True)

    [logger.info](http://logger.info)("Subscribed")

await session.subscribe(printer, “news”)

while True:

logger.info(“Sleeping”)

await asyncio.sleep(60)

logger.info(“Publishing”)

await session.publish(“news”, “Fake news”,
options=options)

                    My crossbar runs behind load balancer (GCP)

that destroys WebSocket connections after 30
seconds, so I would expect Autobahn to detect
connection termination and reconnect - actually,
as per my LB configuration the program should
never reach the last line.

                    This is what happens, but only once. Is this

a bug here? I’m using autobahn 18.12.1 and
crossbar docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

                        2019-01-05T00:00:44.912:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:00:46.075:

main INFO ## Entering on_join

                        2019-01-05T00:00:46.076:

main INFO ## Subscribed

2019-01-05T00:00:46 .321:
main INFO ## Sleeping

2019-01-05T00:01:15 .640:
autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                        2019-01-05T00:01:15.642:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                        2019-01-05T00:01:17.806:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                        2019-01-05T00:01:19.174:

main INFO ## Subscribed

                        2019-01-05T00:01:19.393:

main INFO ## Sleeping

                    Note the expected 30 second between

“Sleeping” and session leaving. And then
connection is re-established and my on_join() is
re-entered as the last log line shows - all good
for the first time, however…

However then the program misbehaves - ** the
asyncio.sleep(60) line is somehow interrupted
after ~20-30 seconds** and program goes on
to the next line to publish message where it
hits autobahn.wamp.exception.TransportLost:

                      2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                      2019-01-05T00:01:19.174:

main INFO ## Subscribed

2019-01-05T00:01:19 .393:
main INFO ## Sleeping

2019-01-05T00:01:46 .329:
main INFO ## Publishing
#
Why does it break from sleep? And why after
just 27 seconds?

                      2019-01-05T00:01:46.331:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.332:

autobahn.wamp.protocol. BaseSession ERROR

While firing onJoin: Traceback (most recent

call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.333:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘ready’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

[Previous line repeated 2 more times]

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:48.665:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:01:48.666:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:01:51.736:

autobahn.asyncio.component. Component
INFO ## connecting once using transport type
“websocket” over endpoint “tcp”

                      2019-01-05T00:01:52.877:

main INFO ## Entering on_join
5

                      2019-01-05T00:01:52.878:

main INFO ## Subscribed

2019-01-05T00:01:53 .094:
main INFO ## Sleeping

2019-01-05T00:02:19 .402:
main INFO ## Publishing # Another break from
sleep. Same tracebacks.

                  And this how it continues to run. Sometimes the

errors order is a bit different:

                      2019-01-05T00:07:15.221:

main INFO ## Entering on_join

                      2019-01-05T00:07:15.222:

main INFO ## Subscribed

                      2019-01-05T00:07:15.528:

main INFO ## Sleeping

                      2019-01-05T00:07:44.712:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:07:44.714:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:08:15.561:

main INFO ## Publishing

                      2019-01-05T00:08:15.563:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                    ... (Same

tracebacks as above)

    You received this message because you are subscribed to the

Google Groups “Autobahn” group.

    To unsubscribe from this group and stop receiving emails from

it, send an email to autobahnws+...@googlegroups.com.

    To post to this group, send email to auto...@googlegroups.com.

    To view this discussion on the web visit [https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com](https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com?utm_medium=email&utm_source=footer).

    For more options, visit [https://groups.google.com/d/optout](https://groups.google.com/d/optout).
0 Likes

#8

Here is the updated gist link: https://gist.github.com/haizaar/d9033b531759ee73807cc8b3f035b742

···

On Tuesday, 15 January 2019 00:22:30 UTC+11, Zaar Hai wrote:

Hi Tobias,

Sorry for the delay - our last discussion ran deep into the night here in Down Under.

I did the tests as you suggested (L7 LB). Outcomes:

Here is relevant excerpt from twisted version:

2019-01-14T23:52:35+1100 openHandshakeTimeoutCall.cancel

2019-01-14T23:52:36+1100 Entering on_join

2019-01-14T23:52:36+1100 Subscribed

2019-01-14T23:52:36+1100 Sleeping <- Didn’t continue beyond sleeping - GOOD!

2019-01-14T23:52:45+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:52:55+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:05+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:05+1100 Connection to/from tcp4:ip.is.hidden:443 was lost in a non-clean fashion: Connection to the other side was lost in a non-clean fashion: Connection lost.

2019-01-14T23:53:05+1100 _connectionLost: [Failure instance: Traceback (failure with no frames): <class ‘twisted.internet.error.ConnectionLost’>: Connection to the other side was lost in a non-clean fashion: Connection lost.

]

2019-01-14T23:53:05+1100 WAMP-over-WebSocket transport lost: wasClean=False, code=1006, reason=“connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake)”

2019-01-14T23:53:05+1100 session leaving ‘wamp.close.transport_lost’

2019-01-14T23:53:05+1100 component failed: ApplicationError: ApplicationError(error=<wamp.close.transport_lost>, args=[], kwargs={}, enc_algo=None)

2019-01-14T23:53:05+1100 Traceback (most recent call last):

Failure: autobahn.wamp.exception.ApplicationError: ApplicationError(error=<wamp.close.transport_lost>, args=[], kwargs={}, enc_algo=None)

2019-01-14T23:53:05+1100 wamp.close.transport_lost:

2019-01-14T23:53:05+1100 session on_disconnect: was_clean=False

2019-01-14T23:53:05+1100 Stopping factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f77224e09b0>

2019-01-14T23:53:05+1100 Entering re-connect loop

2019-01-14T23:53:05+1100 trying transport 0 using connect delay 2.3187428648299617

2019-01-14T23:53:07+1100 connecting once using transport type “websocket” over endpoint “tcp”

2019-01-14T23:53:07+1100 Starting factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f77224f20b8>

2019-01-14T23:53:08+1100 openHandshakeTimeoutCall.cancel

2019-01-14T23:53:08+1100 Entering on_join

2019-01-14T23:53:08+1100 Subscribed

2019-01-14T23:53:09+1100 Sleeping

2019-01-14T23:53:18+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:28+1100 WebSocketProtocol.onPing(payload=<4 bytes>)

2019-01-14T23:53:36+1100 Publishing <- Continued beyond sleeping

2019-01-14T23:53:36+1100 While notifying ‘join’: Traceback (most recent call last):

File “/home/…/lib/python3.7/site-packages/twisted/internet/defer.py”, line 654, in _runCallbacks

current.result = callback(current.result, *args, **kw)

On Friday, 11 January 2019 21:08:26 UTC+11, Tobias Oberstein wrote:

  one more hint: to enable an insane amount of logging on the

crossbar side:

./crossbarfx edge start --loglevel trace

  here you see the incoming client initiated pings, replies, and

also the crossbar initiated pings, and receive replies:

  2019-01-11T11:06:15+0100 [Router      19176

crossbar.router.service. RouterServiceAgent]
RouterServiceAgent.publish(" wamp.session.on_join") ->
“wamp.session.on_join” on “realm1”

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:18+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:21+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:24+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:25+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPing(payload=<4 bytes>)

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:27+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: sending ping auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Expecting
ping in 2.0 seconds for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol] Auto
ping/pong: received pending pong for auto-ping/pong

  2019-01-11T11:06:30+0100 [Router      19176

crossbar.router.protocol. WampWebSocketServerProtocol]
WebSocketProtocol.onPong(payload=<4 bytes>)

  Am 11.01.19 um 11:03 schrieb Tobias > > Oberstein:

ok, just tested:

crossbar with ping/pong settings: https://gist.github.com/oberstet/da507a03c26743f7738ebeb59ece5f7d

autobahn (twisted/apprunner): https://gist.github.com/oberstet/396325ebfbd4e08d1542dd278c5fb679

    I have stopped and restarted crossbar twice while leaving the

client run, and it automatically reconnects correctly (and also
starts the ping/png again)

in the client log, you’ll notice the

2019-01-11T10:48:23+0100 WebSocketProtocol.onPing( payload=<4

)

lines. this is the client responding to crossbar ping requests

the lines

    2019-01-11T10:48:24+0100 Auto ping/pong: sending ping

auto-ping/pong

    2019-01-11T10:48:24+0100 Expecting ping in 5.0 seconds for

auto-ping/pong

    2019-01-11T10:48:24+0100 Auto ping/pong: received pending pong

for auto-ping/pong

    2019-01-11T10:48:24+0100 WebSocketProtocol.onPong(        payload=<4

)

    are the client initiating the ping, and receiving the crossbar

response.


    above is using autobahn/twisted/apprunner: pip install

autobahn[twisted]

    (yeah, I know you want asyncio, but this is to track the issues

to their root cause first - I positively know above works;)


could you maybe try that in your setup and dump logs here?

    also: are you using TLS (secure websocket), and if so

terminating TLS at GCP or at crossbar?

could you test the latter? wss and tls termination at crossbar?

    reason: otherwise the ping/pong messages might not even be

forwarded by GCP!! a websocket intermediary is allowed to do
that (only for websocket ping/pong messages, not regular
websocket messages). it can completely drop ping/pongs - or only
forward some - or create its own

this could be one issue.

    when you terminate TLS at crossbar, and consequently use GCP

only as a layer 4 LB, GCP simply cannot look inside and should
be sad (stop its weird interference)

    if you terminate TLS at GCP, and use GCP at layer 7, the next

thing I would try is doing a manual ping/pong using regular
websocket messages. gcp is NOT allowed to drop those;) that
would suck nevertheless, but we could think about what we can to
mitigate.

    in any case: if you don't have a hard reason for L7 LB, go for

L4 - that is what we do and recommend usually (also for
“security reasons”: you are in control of your TLS keys/certs,
and your traffic is kept confidental right into your VMs).

    anyways: above tests will reveal more information .. pls let me

know the results, and I can comment again

    Am 11.01.19 um 10:35 schrieb Zaar > > > Hai:

Hi Tobias,

Thanks for all the details.

        I already had production-settings in place, with ping

pong enabled as you advise. The thing with GCP Load Balancer
is that they kill websockets connections regularly ** regardless
of activity** (yes, sounds weird, but it is true).
Every 30s by default - it’s configurable but for now I
prefer to keep it short to ease debugging.

        The problem that I experience is a bit different (IMHO) -

autobahn detects broken connection on the background, but
only for the first time. Something breaks after reconnect -
on 2nd/3rd/etc. iteration asyncio.sleep(60) just
miraculously finishes early after just 20-30 seconds (the
lines are highlighted in original post).

        I happens very consistently - first one works fine, but

other don’t.

Do you see a problem here?

Again, I’ll sum it up in human words:

What happens on first connection (good):

  1. on_join is paused on asyncio.sleep(60)
  2. network issue detected
  3. on_join gets rightfully cancelled
  4. network connection re-established
  5. on_join called to start over
    After the above reconnect, the following happens
    (strange)
  1. on_join is paused on asyncio.sleep(60)
  2. asyncio.sleep() finishes prematurely
  3.             session.publish is called (next line) and raises since
    

connection is dead

  1. connection is reestablished
  2. on_join is rerun
    While it works, logs noise aside, I think there is a
    bug here with on_join cancellation. Without looking at the
    code, I can speculate that you cancel it properly first
    time, but not the 2nd/3rd/etc. time - somehow causing it
    to continue. Does it make sense to you?
        Client side pings are surely interesting and looking at

your link, I see that asyncio/wamp has similar hard-coded
transport
settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/asyncio/wamp.py#L223 .
Is there a quick way to check whether those are enabled
indeed (on client)?

        On Friday, 11 January 2019 19:36:26 UTC+11, tobia...@gmail.com > > > >             wrote:
              yeah, network intermediaries (like load balancers,

NATs, reverse proxies, …) will often time out idle
websocket connections. idle = not sending/receiving
anything (not a single octet) for some amount of time.

try adding these settings to your crossbar node: https://crossbar.io/docs/WebSocket-Options/#production-settings

(the websocket “auto_ping_*” parameters)

              that config will make crossbar do regular

heartbeating (every 10s), and that should prevent the
timeout of the connection


              note: more details come into consideration when on

mobile networks. eg above settings will prevent
timeout there too, but the mobile endpoint might go
into sleep states still, which will introduce a delay
eg receiving pubsub events. if you want to keep the
mobile endpoint in “snappy state”, you might need to
increase the

auto_ping_size
              in my experiments, I need a couple of kb/s to keep

the endpoint in snappy state. which of course has the
tradeoff of energy consumption.

              anyways: this is now really advanced stuff .. if

you care about this (too), check out:

              * [https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html](https://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/1083.html)
              also: crossbar has a quite advanced websocket

ping/pong mechanism

https://github.com/crossbario/autobahn-python/issues/642

              reason: it is non-trivial to do that on 100k's of

connections. and mozilla being a user of autobahn
websocket for notifications, we had to solve that;)
they run north 100m connections. anyways, this is
advanced stuff, hope it doesn’t confuse too much


              note: this is server (crossbar) initiated

ping/pong, should be enough for your case. however,
for quick detection of loss of connection (TCP over
WAN), more is needed: client initiated ping pong as
well. we have that too (in autobahn python), but the
status of the feature depends:

              autobahn python runs on any combination of: Twisted

vs asyncio and “old API” (ApplicationRunner) vs “new
API” (Component based)

              App Runner on Twisted has client initiated

ping/pong with currently hard-coded (but
practical/production) parameter settings: https://github.com/crossbario/autobahn-python/blob/master/autobahn/twisted/wamp.py#L282

              the other 3 variants: lagging behind (I think). I

mostly deal with the twisted/apprunner still these
days (as that is what crossbar uses internally …
historically that variant was the first)

              I figure you want asyncio + component: which is

definitely “the way to go” forward for a WAMP only Py3
app component (no need for Twisted when you dont need
all the bells and whistles that Twisted has vs
asyncio)

            Am Freitag, 4. Januar 2019 14:19:18 UTC+1 schrieb Zaar > > > > > Hai:
                Good day and happy new year to

everyone.

                  I have a very simple WAMP program with on_join

doing the following:

@component.on_join

                          async

def on_join(session, details):

    [logger.info](http://logger.info)                          ("Entering

on_join")

options = PublishOptions(exclude_me= False,
acknowledge=True)

    [logger.info](http://logger.info)("Subscribed")

await session.subscribe(printer, “news”)

while True:

logger.info(“Sleeping”)

await asyncio.sleep(60)

logger.info(“Publishing”)

await session.publish(“news”, “Fake news”,
options=options)

                    My crossbar runs behind load balancer (GCP)

that destroys WebSocket connections after 30
seconds, so I would expect Autobahn to detect
connection termination and reconnect - actually,
as per my LB configuration the program should
never reach the last line.

                    This is what happens, but only once. Is this

a bug here? I’m using autobahn 18.12.1 and
crossbar docker pypy3-18.9.2.

Here is the detailed flow:

On the first minute I get:

                        2019-01-05T00:00:44.912:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:00:46.075:

main INFO ## Entering on_join

                        2019-01-05T00:00:46.076:

main INFO ## Subscribed

2019-01-05T00:00:46 .321:
main INFO ## Sleeping

2019-01-05T00:01:15 .640:
autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                        2019-01-05T00:01:15.642:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                        2019-01-05T00:01:17.806:

autobahn.asyncio.component. Component
INFO ## connecting once using transport
type “websocket” over endpoint “tcp”

                        2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                        2019-01-05T00:01:19.174:

main INFO ## Subscribed

                        2019-01-05T00:01:19.393:

main INFO ## Sleeping

                    Note the expected 30 second between

“Sleeping” and session leaving. And then
connection is re-established and my on_join() is
re-entered as the last log line shows - all good
for the first time, however…

However then the program misbehaves - ** the
asyncio.sleep(60) line is somehow interrupted
after ~20-30 seconds** and program goes on
to the next line to publish message where it
hits autobahn.wamp.exception.TransportLost:

                      2019-01-05T00:01:19.173:

main INFO ## Entering on_join

                      2019-01-05T00:01:19.174:

main INFO ## Subscribed

2019-01-05T00:01:19 .393:
main INFO ## Sleeping

2019-01-05T00:01:46 .329:
main INFO ## Publishing
#
Why does it break from sleep? And why after
just 27 seconds?

                      2019-01-05T00:01:46.331:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.332:

autobahn.wamp.protocol. BaseSession ERROR

While firing onJoin: Traceback (most recent

call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:46.333:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘ready’: Traceback (most

recent call last):

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

                        File

“/home/…/lib/python3.7/site- packages/txaio/aio.py”,
line 514, in done

                          res =

f.result()

[Previous line repeated 2 more times]

                        File

“/tmp/repro.py”, line 33, in on_join

                          await

session.publish(“news”, “Fake news”,
options=options)

                        File

“/home/…/lib/python3.7/site-packages/autobahn/wamp/ protocol.py”,
line 1278, in publish

                          raise

exception.TransportLost()

autobahn.wamp.exception.TransportLost

                      2019-01-05T00:01:48.665:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:01:48.666:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:01:51.736:

autobahn.asyncio.component. Component
INFO ## connecting once using transport type
“websocket” over endpoint “tcp”

                      2019-01-05T00:01:52.877:

main INFO ## Entering on_join
5

                      2019-01-05T00:01:52.878:

main INFO ## Subscribed

2019-01-05T00:01:53 .094:
main INFO ## Sleeping

2019-01-05T00:02:19 .402:
main INFO ## Publishing # Another break from
sleep. Same tracebacks.

                  And this how it continues to run. Sometimes the

errors order is a bit different:

                      2019-01-05T00:07:15.221:

main INFO ## Entering on_join

                      2019-01-05T00:07:15.222:

main INFO ## Subscribed

                      2019-01-05T00:07:15.528:

main INFO ## Sleeping

                      2019-01-05T00:07:44.712:

autobahn.asyncio.component. Component
INFO ## session leaving
‘wamp.close.transport_lost’

                      2019-01-05T00:07:44.714:

autobahn.asyncio.component. Component
ERROR ## Connection failed: TransportLost:
failed to complete connection

                      2019-01-05T00:08:15.561:

main INFO ## Publishing

                      2019-01-05T00:08:15.563:

autobahn.wamp.protocol. BaseSession ERROR

While notifying ‘join’: Traceback (most

recent call last):

                    ... (Same

tracebacks as above)

    You received this message because you are subscribed to the

Google Groups “Autobahn” group.

    To unsubscribe from this group and stop receiving emails from

it, send an email to autobahnws+...@googlegroups.com.

    To post to this group, send email to auto...@googlegroups.com.

    To view this discussion on the web visit [https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com](https://groups.google.com/d/msgid/autobahnws/88db33a0-d465-45c4-bf6a-0138b1c09d30%40googlegroups.com?utm_medium=email&utm_source=footer).

    For more options, visit [https://groups.google.com/d/optout](https://groups.google.com/d/optout).
0 Likes