aes67

Framework for AES67 targeting embedded devices.

WORK IN PROGRESS

Designed in particular to be employed on embedded devices and thus not relying on dynamic memory allocation (although optionally possible where meaningful), tight control over memory usage, no dependencies on external libraries, in particular as few hardware/library abstractions are used as possible - the framework will have to be adapted according to needs.

Components are intended to be as minimal as possible to allow for essential AES67 operations and be as interoperable as possible - in detail this is not yet clear and requires further investigation into different manufacturer-dependent implementations.

https://github.com/tschiemer/aes67

Rough feature/support roadmap

Clock / Synchronisation
- [ ] PTPv2 / IEEE1588-2008 (as per AES67-2018)
- [ ] PTPv1 / IEEE1588-2002 ?
- [ ] PTPv2.1 / IEEE1588-2019 ?
- [ ] IEEE802.1AS-2011 ?
Discovery & | Management
- [x] SAP (required for broader interoperability)
  - [x] ~~zlib (de-)compression support?~~ -> interface for external implementation
  - [x] ~~authentication support?~~ -> interface for external implementation
- [x] SDP
- [ ] SIP ? (for unicast management according to standard, but most systems use multicast only..)
- [ ] RTSP ? (meaningful for system with Ravenna-based components if no RAV2SAP)
  - [ ] make utility implementation embedded friendly
- [ ] AES70/OCA work in progress
  - [x] mDNS / DNS-SD
Stream
- [ ] RTP
- [ ] RTCP
Command line / developer utilities
- SAP
  - [x] sap-pack: create SAP message(s)
  - [x] sap-unpack: parse SAP message(s)
  - [ ] sapd: SAP daemon (with Ravenna support)
    - [x] SAP server
    - [x] RAV lookup + pass to SAP server
    - [x] publish SDP files from config dir
    - [x] RAV publish of locally managed sessions
- SDP
  - [x] sdp-parse: parse SDP(s)
  - [x] sdp-gen: generate SDP
- RTSP/HTTP
  - [x] rtsp-describe: retrieve SDP from RTSP service
  - [ ] ~~rtsp/http combo server?~~ -> rav-publish
- RAVENNA
  - [ ] ~~RAV2SAP~~ -> sapd
  - [x] rav-lookup: browse for sessions/devices
  - [x] rav-publish: publish sessions and optionally serve SDP files
- PTP
  - [ ] ptp-monitor? -> ptptrackhound & ptpmonkey
  - [ ] ptp-server?
- RTP/RTCP
  - [ ] rtp-send: send RTP (from STDIN)
  - [ ] rtp-recv: receive RTP (to STDOUT)
- Support
  - [x] mDNS (abstraction for mdns service)
    - [x] dns-sd
    - [x] avahi (to be tested further)
  - [x] RTSP describe client + server

In a Nutshell

Aspects of AES67 and the implementation considerations within this framework.

Disclaimer: my understanding as someone learning about AES67 still might not be error free, please drop me a line if you see something wrong.

Clock / Synchronisation

AES67 devices are ment to synchronize their local clocks through PTPv2 (IEEE 1588-2008) which foresees a best (grand-)master clock telling all slaved devices the current time.
This local clock will slightly drift with respect to the grandmaster clock and thus the local clock is to adapt its effective (network) clock rate to match the grandmaster clock as good as possible.

This local (network) clock then is ment to drive the stream's media clock and implicitly any other audo processing components, in particular also ADCs and DACs thus achieving a tight synchronisation very much like a classical wordclock (WC).

If multiple clock synchronization sources are given, say a network clock and a wordclock, the wordclock will likely be more precise as there should not be any variability due to network conditions - the device would be a rather good candidate to act as grandmaster clock and generally the WC should be preferred if the clock source is identical (the principle of a strictly hierarchical clock distribution with but one overall clock master and transitive master-slave relationships only should be respected, obviously).

Is a clock or synchronization required for any type of device? Pragmatically speaking, no. A passive device - such as a recorder-only device - doesn't necessarily have to be synchronised to a clock. Assuming all senders are properly synchronised then a recorder may just listen to all stream packets and store them after (optimally) aligning them in time.

Optimally (realtime) playout should occur after time alignment (if multiple sources are given). Pragmatically speaking, time alignment isn't necessary and would allow for simpler implementations, but in this case audio sent at the same time would be played back at (slightly) different times which might be unwanted behaviour and - strictly speaking - somewhat beats the purpose of tight synchronisation.

What's this with time alignment? Well, streams can be configured with different packet ptimes (realtime duration of stream data in a packet) which implies different sizes of receive buffers which implies different playout times. So, allowing different combinations of incoming stream configurations (w.r.t. ptime) makes implementations more complicated, because the lower latency streams (smaller ptime) will have to adapt to the highest latency stream (maxptime, so to speak). And technically speaking, receive buffer changes (due to combining of different ptimes) can't happen without dropping or inserting samples - which leads to the decision of either not aligning received streams w.r.t. time or a priori fixing a common max delay setting.

Non-dedicated devices - such as computers with virtual sound cards - would seem to be an interesting case to be considered w.r.t. the clock/synchronisation.

Discovery & Management

Discovery and management approaches are generally not needed - but require other (ie manual) configuration. For ease of integration such methods are generally recommended and thus considered within this framework.

Joining of multicast session_data essentially requires but the joining of respective multicast group. Setting up a unicast session_data requires cooperation of the partners, ie some form of control protocol. Seemingly unicast sessions are barely in use (see wikipedia).

AES67 generally leaves the choice of discovery and management mechanism open, but it names several possibilities to be aware of:

Bonjours / mDNS (DNS-SD) is proposed in conjunction with SIP, ie the device's SIP URI / service is announced (unicast sessions)
SAP is proposed for announcement of multicast sessions
Axia Discovery Protocol
Wheatstone WheatnetIP Discovery Protocol
AMWA NMOS Discovery and Registration Specification (IS-04)

Not mentioned, but seemingly also used in distributed products (according to wikipedia):

Real-Time Streaming Protocol

As discussed elsewhere AES70 - a rather young discovery and control standard of networked audio devices - is suggested as a promising solution.

Conclusion

For broad integration SAP seems like a general requirement for any device.

Further it is (generally) proposed to use AES70 for discovery and management, in particular because the standard is a collaborative effort and provides several meaningful features out of the box (although it is somewhat complex) beyond discovery and stream management.

SIP may be considered (in the future) for management of unicast streams but it is not only barely adopted, as an (somewhat elaborate) additional service it only provides connection management.

RTSP may be considered (in the future) for management of unicast streams aswell as service discovery of Ravenna streams.

Audio

Encoding

AES67 Audio is to be streamed in L16 or L24 encoding; that is, each sample is a two-, three-byte respectively signed integer in two's complement representation in network byte order (bigendian), samples are interleaved. The common I2S & TDM inter-chip audio protocols use identical formats (roughly speaking) which shall be the primary encoding focused herein. AES3, AES10 (MADI), AES50 are frame-based and use or can use a least-significant-bit to most-significant-bit encoding.

Routing

Given a fixed local audio sour multicast streaming is rather straightforward.

For potential optimization multiple instances might be considered supported.

In the most simple case incoming streams might be handled similarly, ie just one multicast stream might be listened to and passed on to the local output.

But if audio is to come from different sources the situation gets more complicated: either the device has the capability of listening to multiple streams and extracting the necessary channels or the single channels are joined on another device into a single (multicast) stream. Obviously this would introduce further latency and make configuration more complicated.

Interesting to note, even basic AES70 connection management by default allows for internal routing of local channels to transmitted stream channels (of a multichannel stream), and analogously allows for custom assignment of incoming stream channels to local output channels. Thus a receiving device should support (at most) as many streams as it has internal (receiving) channels - although practically speaking there typically will be less senders than relevant received channels unless each sender transmits only one relevant channel, so this can be constrained (thereby constraining possible system configurations).

In the sense of AES70 transmission and reception buffers are designed to provide a single interface for local in- and output of channels to b

Aes67

Install / Use

README