Segmented-Audio over Websockets to HTML5

#1

We’ve been experimenting using Autobahn/Websockets to send near

real-time multimedia data from small clients for rendering in HTML5.

This note shares a little bit of our results. Part of the reason for

this is to see if anyone else is doing anything similar.

HTML5 does not yet define a means for real-time continuous audio

transmission. When WebRTC is ratified, it will form the basis for

that kind of capability, but it is likely there will remain browser

incompatibilities. Javascript Media Source extensions are another

promising technology that when delivered will enable browsers to

assemble presentations of media (like audio and video) without

discontinuities or artifacts. The latest spec

(https://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html)

shows collaboration between Google, Microsoft and Netflix. However,

this feature is only available on an experimental basis in a few

browsers.

Even without these emerging specifications, the current HTML5 browser

has interesting capabilities for multimedia presentation.

It has:

  • websockets: a transport medium

  • javascript: an efficient interpreter

  • event scheduling: both in Javascript and AudioContext

  • audio rendering: Audio elements, and the AudioContext

If a continuous audio broadcast is broken into a series of chunks, a

browser can re-assemble the sequence of audio chunks for playback with

timing that is close to the original. The resulting audio sometimes

has artifacts in the form of occasional clicks between the chunks.

However, it turns out that the quality is remarkably good. The quality is

browser-dependent, and it is interesting to compare the results across browsers.

Testing the capability of HTML5 to reassemble a continuous audio

broadcast required the development of three separate things:

  1. a broadcasting application (a program to sample a microphone and

turn it into data for sending via HTTP)

  1. a server (for receiving the audio data, potentially transcoding

audio for browsers that need it, and for sending through a websocket

to an HTML5 web-page)

  1. a Javascript receiving application (HTML5 + Javascript)

The result of these experiments is the Wazwot iPhone App and its

associated web service.

How does Autobahn Websockets fit in?

Autobahn Websockets provides a Resource interface so that it is easy

to mix Websocket objects into a tree of HTTP Resources. This

capability simplified the construction of the server, since it needs

to handle many broadcasts simultaneously. In our implementation, each

broadcast “channel” is its own resource with its own HTML and

Websocket components.

We originally chose AB because of my familiarity with Twisted and

Python, but also because of some of the Twisted features it embraces.

In particular we needed flow-control (in the form of an

IPushProducer). Real-time audio transmission over a web socket needs

a means to monitor TCP back-pressure and adjust the stream. The

Autobahn/Twisted framework made it pretty easy to implement this

requirement.

Try it out:

Wazwot is basically a research project. It sends continuous audio and

also continuous image frames. If anyone on this list is interested in

trying it out, send me a note and I’ll shoot you an IOS app promo code

for evaulation.

https://itunes.apple.com/us/app/wazwot/id684986597?mt=8

0 Likes

#2

Hi Tom,

We've been experimenting using Autobahn/Websockets to send near
real-time multimedia data from small clients for rendering in HTML5.
This note shares a little bit of our results. Part of the reason for
this is to see if anyone else is doing anything similar.

Thanks for sharing! This is quite interesting ..

However, it turns out that the quality is remarkably good. The quality is
browser-dependent, and it is interesting to compare the results across
browsers.

Curious: how do browsers stack up rgd quality? Did you test IE10 (desktop and/or WP)?

Testing the capability of HTML5 to reassemble a continuous audio
broadcast required the development of three separate things:

What is your fragment size (the size into which you break down the continous audio)?

Do you dynamically adjust that?

1) a broadcasting application (a program to sample a microphone and
turn it into data for sending via HTTP)

So the upstream isn't WebSocket? Any reasons?

We originally chose AB because of my familiarity with Twisted and
Python, but also because of some of the Twisted features it embraces.
In particular we needed flow-control (in the form of an
IPushProducer). Real-time audio transmission over a web socket needs
a means to monitor TCP back-pressure and adjust the stream. The
Autobahn/Twisted framework made it pretty easy to implement this
requirement.

This - in particular - is satisfying to hear;) Since a) the relevancy of flow-control and TCP-backpressure adaption by the an app is a topic that many seem to be unaware of, and b) while the WS protocol was still in the cooking on the IETF, some wanted to introduce strange features just to workaround in a half-baked way, since they didn't understand / couldn't support arbitrary streaming and backpressure-to-app scenarios
and I've been fighting hard to avoid these;)

Twisted of course got it right years ago (providing Producer/Consumer machinery) and AutobahnPython has embraced that from the very beginning.

For people who wanna read more, here are 2 starters:

http://autobahn.ws/python/tutorials/producerconsumer

https://github.com/tavendo/AutobahnPython/tree/master/examples/websocket/streaming

This even includes an example of a sending a continous, backpressure controlled stream of data as a single WS message (efffectively rendering WS a fancy prelude to "raw TCP").

Cheers
/Tobias

···

Am 05.10.2013 17:03, schrieb Tom Sheffler:

0 Likes

#3

Hi Tom,

We've been experimenting using Autobahn/Websockets to send near
real-time multimedia data from small clients for rendering in HTML5.
This note shares a little bit of our results. Part of the reason for
this is to see if anyone else is doing anything similar.

Thanks for sharing! This is quite interesting ..

However, it turns out that the quality is remarkably good. The quality is

browser-dependent, and it is interesting to compare the results across
browsers.

Curious: how do browsers stack up rgd quality? Did you test IE10 (desktop
and/or WP)?

I've tested IE10 desktop, and it's very very good. I don't have access to
a WP, but would like to try it.

Testing the capability of HTML5 to reassemble a continuous audio

broadcast required the development of three separate things:

What is your fragment size (the size into which you break down the
continous audio)?

Do you dynamically adjust that?

It currently chooses a 1-second fragment size. The reassembly buffer
requires a few segments before it starts. The resulting audio latency is 4
to 5 seconds as a result. These parameters result in a generally good
audio experience. It would be interesting to develop algorithms to
optimize latency or quality as a function of network.

1) a broadcasting application (a program to sample a microphone and
turn it into data for sending via HTTP)

So the upstream isn't WebSocket? Any reasons?

A few. The application is very asymmetric: the broadcaster is not a
browser, and the receiver is. Websockets is a good way to "push"
information to a browser (the packets of a stream), so that's a given.

On the broadcaster side I chose to leverage HTTP/1.1 and the connection
pools that most platforms offer. If I had chosen a websocket upload, the
application would have to manage disconnections and retries. With the HTTP
request approach, the connection pools mask many of the network issues.
Data packets do arrive out of order at the server as a result. Request
pipelining is also possible.

We originally chose AB because of my familiarity with Twisted and

Python, but also because of some of the Twisted features it embraces.
In particular we needed flow-control (in the form of an
IPushProducer). Real-time audio transmission over a web socket needs
a means to monitor TCP back-pressure and adjust the stream. The
Autobahn/Twisted framework made it pretty easy to implement this
requirement.

This - in particular - is satisfying to hear;) Since a) the relevancy of
flow-control and TCP-backpressure adaption by the an app is a topic that
many seem to be unaware of, and b) while the WS protocol was still in the
cooking on the IETF, some wanted to introduce strange features just to
workaround in a half-baked way, since they didn't understand / couldn't
support arbitrary streaming and backpressure-to-app scenarios
and I've been fighting hard to avoid these;)

This lesson was apparent in the application. Without backpressure
monitoring, we could not control latency properly.

Twisted of course got it right years ago (providing Producer/Consumer
machinery) and AutobahnPython has embraced that from the very beginning.

For people who wanna read more, here are 2 starters:

http://autobahn.ws/python/**tutorials/producerconsumer<http://autobahn.ws/python/tutorials/producerconsumer>

https://github.com/tavendo/AutobahnPython/tree/master/
examples/websocket/streaming<https://github.com/tavendo/AutobahnPython/tree/master/examples/websocket/streaming>

This even includes an example of a sending a continous, backpressure
controlled stream of data as a single WS message (efffectively rendering WS
a fancy prelude to "raw TCP").

Cheers
/Tobias

And cheers to you!
-Tom

···

On Sat, Oct 5, 2013 at 9:04 AM, Tobias Oberstein <tobias.o...@gmail.com > wrote:

Am 05.10.2013 17:03, schrieb Tom Sheffler:

0 Likes