Trouble with RPCs in Autobahn Python

#1

Hi,

I’m working for Siemens on an evaluation of WAMP, the Autobahn libraries and the Crossbar.io Router. Doing so, I ran into some trouble with a test setup using AutobahnPython for the implementation of my clients and the Crossbar.io Router.

The test setup consists of one client who is calling a remote procedure at another client (once every 120 milliseconds at most). The RPC takes some time (about 4 milliseconds) to process. The Crossbar.io Router connects both clients, using WebSocket transport.

The first thing I noticed was that, after some time, the callee reported “session leaving ‘wamp.close.transport_lost’”, the router reported that the callee didn’t respond to a ping in time and the caller complained, that the callee left. After disabling the Ping in the Router Configuration the problem disappeared, so it seems that the callee can’t respond to a ping while it’s processing the Remote Procedure Call, at least not the way I implemented my callee. Is there a way I can implement my callee using Autobahn Python and still have WebSocket Pings?

I also observed another strange behavior, while testing my setup with disabled WebSocket Pings. After some time the callee took a very long time (up to 30 seconds) to process RPCs. I tried to reproduce the behavior and get as much information as possible and I came up with the following trace:

  • the caller calls the remote procedure (once every 120ms at most)

  • the Router forwards the call without any serious delay, I got this from the logfile of the Router

  • the callee takes a very long time until the actual user written RPC gets executed, I measured the execution time of the user written code and it didn’t change (still 4ms)

  • when the callee yields the result and reports that it finished executing the user written RPC, the Router forwards it without any serious delay

Could it be that this problem occurs when a remote procedure call is performed, while the result of a previous RPC wasn’t yet provided? Because my trace also showed that after the delay, multiple results of outstanding RPCs were yielded in a very short time period, almost like the client “queued them to execute them all at once and then send the results back”. I implemented a callee with the same behavior using AutobahnCpp and RawSocket communication and it works. I also ran my setup on different hardware devices to rule out other factors.

Is there a logical explanation for this behavior or could there be something wrong with my implementation after all? If you want to reproduce the error or have a look at my implementation, just let me know and I will put together a minimal example and publish it here.

Any help is very much appreciated, thanks!

0 Likes

#2

Hi Felix, welcome to the world of Autobahn

Hi,

I’m working for Siemens on an evaluation of WAMP, the Autobahn libraries and the Crossbar.io Router. Doing so, I ran into some trouble with a test setup using AutobahnPython for the implementation of my clients and the Crossbar.io Router.

The test setup consists of one client who is calling a remote procedure at another client (once every 120 milliseconds at most). The RPC takes some time (about 4 milliseconds) to process. The Crossbar.io Router connects both clients, using WebSocket transport.

Do you think its something you can show us ? Without looking at the code it might be guess work at most for us to suggest changes

The first thing I noticed was that, after some time, the callee reported “session leaving ‘wamp.close.transport_lost’”, the router reported that the callee didn’t respond to a ping in time and the caller complained, that the callee left. After disabling the Ping in the Router Configuration the problem disappeared, so it seems that the callee can’t respond to a ping while it’s processing the Remote Procedure Call, at least not the way I implemented my callee. Is there a way I can implement my callee using Autobahn Python and still have WebSocket Pings?

It feels like you are “choking” the websocket transport, hence your WAMP client doesn’t even receive the ping message. I am assuming your callee returns lots of bytes in a single response, if possible I would suggest sending those bytes in smaller chunks.

I also observed another strange behavior, while testing my setup with disabled WebSocket Pings. After some time the callee took a very long time (up to 30 seconds) to process RPCs. I tried to reproduce the behavior and get as much information as possible and I came up with the following trace:

  • the caller calls the remote procedure (once every 120ms at most)
  • the Router forwards the call without any serious delay, I got this from the logfile of the Router
  • the callee takes a very long time until the actual user written RPC gets executed, I measured the execution time of the user written code and it didn’t change (still 4ms)
  • when the callee yields the result and reports that it finished executing the user written RPC, the Router forwards it without any serious delay

Could it be that this problem occurs when a remote procedure call is performed, while the result of a previous RPC wasn’t yet provided? Because my trace also showed that after the delay, multiple results of outstanding RPCs were yielded in a very short time period, almost like the client “queued them to execute them all at once and then send the results back”. I implemented a callee with the same behavior using AutobahnCpp and RawSocket communication and it works. I also ran my setup on different hardware devices to rule out other factors.

Is there a logical explanation for this behavior or could there be something wrong with my implementation after all? If you want to reproduce the error or have a look at my implementation, just let me know and I will put together a minimal example and publish it here.

I think a code example, that helps reproduce the issue would be very helpful, also have you taken a look at our existing examples https://github.com/crossbario/autobahn-python/tree/master/examples ?

Any help is very much appreciated, thanks!

Sure, we’d love you help find a solution to your problems.

Thanks!

···

On Monday, February 4, 2019 at 7:14:38 PM UTC+5, Felix Oberhansl wrote:

0 Likes

#3

Hi Felix,

Sorry if I’m asking the obvious, but do you write your RPC handler in a async manner? Just making your your handler does not block the event loop.

···

Zaar

0 Likes

#4

Thank you for your response!

It feels like you are “choking” the websocket transport, hence your WAMP client doesn’t even receive the ping message. I am assuming your callee returns lots of bytes in a single response, if possible I would suggest sending those bytes in smaller chunks.

Actually my callee doesn’t return anything. I have put together a little example consisting of a callee and a caller. The callee is almost the same as the callee from the provided examples. The caller calls the remote procedure every 5 seconds. With my setup this example fails almost immediately, the remote procedure call takes about 3 seconds and I set my auto_ping_interval at 1000 and my auto_ping_timeout at 2000. To me it seems that this behavior occurs, when the execution time of the remote procedure call exceeds the ping timeout.

This is my setup:

  • one Raspberry Pi running the caller

  • one Raspberry Pi running the callee

  • one Raspberry Pi running the Crossbar.io Router

If there is anything wrong with my code or you need additional information, please let me know. The caller implementation differs from the examples, because of the actual application I am creating, but the callee is pretty much the same.

Thanks again for your help!

callee.py (1.09 KB)

caller.py (5.74 KB)

0 Likes

#5

Thank you for your response!

It feels like you are “choking” the websocket transport, hence your WAMP client doesn’t even receive the ping message. I am assuming your callee returns lots of bytes in a single response, if possible I would suggest sending those bytes in smaller chunks.

Actually my callee doesn’t return anything. I have put together a little example consisting of a callee and a caller. The callee is almost the same as the callee from the provided examples. The caller calls the remote procedure every 5 seconds. With my setup this example fails almost immediately, the remote procedure call takes about 3 seconds and I set my auto_ping_interval at 1000 and my auto_ping_timeout at 2000. To me it seems that this behavior occurs, when the execution time of the remote procedure call exceeds the ping timeout.

This is my setup:

  • one Raspberry Pi running the caller
  • one Raspberry Pi running the callee

If there is anything wrong with my code or you need additional information, please let me know. The caller implementation differs from the examples, because of the actual application I am creating, but the callee is pretty much the same.

As Zaar said above, you seem to be blocking the twisted reactor (RPi can be slow sometimes), hence crossbar doesn’t get a response in time.

What you could change in your callee is to wrap your long_op function into another function that calls reactor.callInThread(), below diff would also work:

— callee.py 2019-02-06 00:14:56.171618335 +0500

+++ c.py 2019-02-06 00:15:25.784277328 +0500

@@ -28,7 +28,7 @@

         print("calculation DONE")         
      
     try:
  •        yield self.register(long_op, u"com.example.callee.longop")
    
  •        yield self.register(lambda: reactor.callInThread(long_op), u"com.example.callee.longop")
    
       except Exception as e:
    
           print("failed to register procedure: {}".format(e))
    
       else:
    
···

On Tuesday, February 5, 2019 at 8:42:51 PM UTC+5, Felix Oberhansl wrote:

Thanks again for your help!

0 Likes

#6

I’ll add that if you have no particular reason to use Twisted I would suggest using Asyncio API instead - it provides much clearer distinction between async and non-async code.
In fact, you can do just:

res = await self.session.call(u"com.example.callee.longop")

And get result/exception right away (w/o callbacks that is).

···

On Wednesday, 6 February 2019 06:17:41 UTC+11, Omer Akram wrote:

On Tuesday, February 5, 2019 at 8:42:51 PM UTC+5, Felix Oberhansl wrote:

Thank you for your response!

It feels like you are “choking” the websocket transport, hence your WAMP client doesn’t even receive the ping message. I am assuming your callee returns lots of bytes in a single response, if possible I would suggest sending those bytes in smaller chunks.

Actually my callee doesn’t return anything. I have put together a little example consisting of a callee and a caller. The callee is almost the same as the callee from the provided examples. The caller calls the remote procedure every 5 seconds. With my setup this example fails almost immediately, the remote procedure call takes about 3 seconds and I set my auto_ping_interval at 1000 and my auto_ping_timeout at 2000. To me it seems that this behavior occurs, when the execution time of the remote procedure call exceeds the ping timeout.

This is my setup:

  • one Raspberry Pi running the caller
  • one Raspberry Pi running the callee

If there is anything wrong with my code or you need additional information, please let me know. The caller implementation differs from the examples, because of the actual application I am creating, but the callee is pretty much the same.

As Zaar said above, you seem to be blocking the twisted reactor (RPi can be slow sometimes), hence crossbar doesn’t get a response in time.

What you could change in your callee is to wrap your long_op function into another function that calls reactor.callInThread(), below diff would also work:

— callee.py 2019-02-06 00:14:56.171618335 +0500

+++ c.py 2019-02-06 00:15:25.784277328 +0500

@@ -28,7 +28,7 @@

         print("calculation DONE")         
      
     try:
  •        yield self.register(long_op, u"com.example.callee.longop")
    
  •        yield self.register(lambda: reactor.callInThread(long_op), u"com.example.callee.longop")
    
       except Exception as e:
    
           print("failed to register procedure: {}".format(e))
    
       else:
    

Thanks again for your help!

0 Likes

#7

Hi Felix,

  yeah, Autobahn supports answering/triggering websocket ping/pongs

while running user code - or in general, supports multiple
concurrent operations

  > Is there a way I can implement my callee using Autobahn

Python and still have WebSocket Pings?

  by avoiding blocking the reactor (when using Twisted) or event

loop (when using asyncio)

things to avoid:

  • running CPU bound code

  • doing blocking I/O (or blocking syscalls in general)

rgd the former:

  if you have CPU bound code (eg a registered WAMP procedure that

will hog up the CPU / single thread), run this code on a
background thread. on Twisted: deferToThread is the tool.

rgd the latter:

  of the 2 I/O classes network and storage, Twisted/asyncio solve

the network part: they provide an asynchronous networking
framework. just avoid the blocking APIs in the Python stdlib. eg
if you want to do a HTTP request from a WAMP procedure body, the
way to do that is not using the “standard” requests library
(), but an asynchronous one (!):

  now disk IO. different from say NodeJS, asyncio/Twisted does

not cover disk IO - and hence this is sth to watch out. eg if you
want to read a 1GB file into a Python string, standard
os.open().read() will block in the OS system call! bad. again,
either defer to a background thread - or mmap the file.

···

https://twistedmatrix.com/documents/current/core/howto/threading.html#running-code-in-threadshttp://docs.python-requests.org/https://treq.readthedocs.io/
Am 04.02.19 um 15:14 schrieb ‘Felix
Oberhansl’ via Autobahn:

Hi,

      I'm working for Siemens on an evaluation of WAMP, the

Autobahn libraries and the Crossbar.io Router. Doing so, I ran
into some trouble with a test setup using AutobahnPython for
the implementation of my clients and the Crossbar.io Router.

      The test setup consists of one client who is calling a

remote procedure at another client (once every 120
milliseconds at most). The RPC takes some time (about 4
milliseconds) to process. The Crossbar.io Router connects both
clients, using WebSocket transport.

      The first thing I noticed was that, after some time, the

callee reported “session leaving ‘wamp.close.transport_lost’”,
the router reported that the callee didn’t respond to a ping
in time and the caller complained, that the callee left. After
disabling the Ping in the Router Configuration the problem
disappeared, so it seems that the callee can’t respond to a
ping while it’s processing the Remote Procedure Call, at least
not the way I implemented my callee. Is there a way I can
implement my callee using Autobahn Python and still have
WebSocket Pings?

      I also observed another strange behavior, while testing my

setup with disabled WebSocket Pings. After some time the
callee took a very long time (up to 30 seconds) to process
RPCs. I tried to reproduce the behavior and get as much
information as possible and I came up with the following
trace:

      - the caller calls the remote procedure (once every 120ms

at most)

      - the Router forwards the call without any serious delay,

I got this from the logfile of the Router

      - the callee takes a very long time until the actual user

written RPC gets executed, I measured the execution time of
the user written code and it didn’t change (still 4ms)

      - when the callee yields the result and reports that it

finished executing the user written RPC, the Router forwards
it without any serious delay

      Could it be that this problem occurs when a remote

procedure call is performed, while the result of a previous
RPC wasn’t yet provided? Because my trace also showed that
after the delay, multiple results of outstanding RPCs were
yielded in a very short time period, almost like the client
“queued them to execute them all at once and then send the
results back”. I implemented a callee with the same behavior
using AutobahnCpp and RawSocket communication and it works. I
also ran my setup on different hardware devices to rule out
other factors.

      Is there a logical explanation for this behavior or could

there be something wrong with my implementation after all? If
you want to reproduce the error or have a look at my
implementation, just let me know and I will put together a
minimal example and publish it here.

Any help is very much appreciated, thanks!

  You received this message because you are subscribed to the Google

Groups “Autobahn” group.

  To unsubscribe from this group and stop receiving emails from it,

send an email to autobahnws+...@googlegroups.com.

  To post to this group, send email to autob...@googlegroups.com.

  To view this discussion on the web visit [https://groups.google.com/d/msgid/autobahnws/132d5d55-0c29-4c86-b861-0c2bbfb0d29b%40googlegroups.com](https://groups.google.com/d/msgid/autobahnws/132d5d55-0c29-4c86-b861-0c2bbfb0d29b%40googlegroups.com?utm_medium=email&utm_source=footer).

  For more options, visit [https://groups.google.com/d/optout](https://groups.google.com/d/optout).
0 Likes

#8

As a remedy for the disk IO troubles that Tobias mentioned, you can use aiofiles.

···

On Wednesday, 6 February 2019 19:55:56 UTC+11, Tobias Oberstein wrote:

Hi Felix,

  yeah, Autobahn supports answering/triggering websocket ping/pongs

while running user code - or in general, supports multiple
concurrent operations

  > Is there a way I can implement my callee using Autobahn

Python and still have WebSocket Pings?

  by avoiding blocking the reactor (when using Twisted) or event

loop (when using asyncio)

things to avoid:

  • running CPU bound code
  • doing blocking I/O (or blocking syscalls in general)

rgd the former:

  if you have CPU bound code (eg a registered WAMP procedure that

will hog up the CPU / single thread), run this code on a
background thread. on Twisted: deferToThread is the tool.
https://twistedmatrix.com/documents/current/core/howto/threading.html#running-code-in-threads

rgd the latter:

  of the 2 I/O classes network and storage, Twisted/asyncio solve

the network part: they provide an asynchronous networking
framework. just avoid the blocking APIs in the Python stdlib. eg
if you want to do a HTTP request from a WAMP procedure body, the
way to do that is not using the “standard” requests library
(http://docs.python-requests.org/ ), but an asynchronous one (!):
https://treq.readthedocs.io/

  now disk IO. different from say NodeJS, asyncio/Twisted does

not cover disk IO - and hence this is sth to watch out. eg if you
want to read a 1GB file into a Python string, standard
os.open().read() will block in the OS system call! bad. again,
either defer to a background thread - or mmap the file.


  the need to avoid blocking operations in asynchronous

programming is by definition: asynchronous programming is a form
of cooperative multiprocessing where a currently running piece of
code must itself and voluntarily yield control to the outer
scheduler (the reactor or event loop).

  hogging up the CPU is not giving up control at all, for longer

time, while running - bad

hope above helps and doesn’t confuse! let me know;)

cheers,

/Tobias

  Am 04.02.19 um 15:14 schrieb 'Felix > Oberhansl' via Autobahn:

Hi,

      I'm working for Siemens on an evaluation of WAMP, the

Autobahn libraries and the Crossbar.io Router. Doing so, I ran
into some trouble with a test setup using AutobahnPython for
the implementation of my clients and the Crossbar.io Router.

      The test setup consists of one client who is calling a

remote procedure at another client (once every 120
milliseconds at most). The RPC takes some time (about 4
milliseconds) to process. The Crossbar.io Router connects both
clients, using WebSocket transport.

      The first thing I noticed was that, after some time, the

callee reported “session leaving ‘wamp.close.transport_lost’”,
the router reported that the callee didn’t respond to a ping
in time and the caller complained, that the callee left. After
disabling the Ping in the Router Configuration the problem
disappeared, so it seems that the callee can’t respond to a
ping while it’s processing the Remote Procedure Call, at least
not the way I implemented my callee. Is there a way I can
implement my callee using Autobahn Python and still have
WebSocket Pings?

      I also observed another strange behavior, while testing my

setup with disabled WebSocket Pings. After some time the
callee took a very long time (up to 30 seconds) to process
RPCs. I tried to reproduce the behavior and get as much
information as possible and I came up with the following
trace:

      - the caller calls the remote procedure (once every 120ms

at most)

      - the Router forwards the call without any serious delay,

I got this from the logfile of the Router

      - the callee takes a very long time until the actual user

written RPC gets executed, I measured the execution time of
the user written code and it didn’t change (still 4ms)

      - when the callee yields the result and reports that it

finished executing the user written RPC, the Router forwards
it without any serious delay

      Could it be that this problem occurs when a remote

procedure call is performed, while the result of a previous
RPC wasn’t yet provided? Because my trace also showed that
after the delay, multiple results of outstanding RPCs were
yielded in a very short time period, almost like the client
“queued them to execute them all at once and then send the
results back”. I implemented a callee with the same behavior
using AutobahnCpp and RawSocket communication and it works. I
also ran my setup on different hardware devices to rule out
other factors.

      Is there a logical explanation for this behavior or could

there be something wrong with my implementation after all? If
you want to reproduce the error or have a look at my
implementation, just let me know and I will put together a
minimal example and publish it here.

Any help is very much appreciated, thanks!

  You received this message because you are subscribed to the Google

Groups “Autobahn” group.

  To unsubscribe from this group and stop receiving emails from it,

send an email to autobahnws+...@googlegroups.com.

  To post to this group, send email to auto...@googlegroups.com.

  To view this discussion on the web visit [https://groups.google.com/d/msgid/autobahnws/132d5d55-0c29-4c86-b861-0c2bbfb0d29b%40googlegroups.com](https://groups.google.com/d/msgid/autobahnws/132d5d55-0c29-4c86-b861-0c2bbfb0d29b%40googlegroups.com?utm_medium=email&utm_source=footer).

  For more options, visit [https://groups.google.com/d/optout](https://groups.google.com/d/optout).
0 Likes

#9

Hi all,

thank you very much for your help and taking the time explaining it to me, it helps me a lot!

0 Likes