Streamline

Streamline is a reference system design for a premium quality, white label, end to end live streaming system from HDMI / HD-SDI capture all the way to a player on a CDN that works on web, iOS, and Android devices.

Using commodity computer hardware, free software, and AWS, it’s an affordable way to learn how to build a very high quality live streaming system.

This project is primarily designed as a learning tool for people to learn how live video works end to end. It is meant to be as simple to understand as possible while still providing great visual quality and streaming performance. It’s not meant for production use as it hasn’t been heavily tested over long periods of time and there are many hard coded parameters. It also may just simply stop working at any time since I’m using the master branch of FFmpeg. Maybe in the next version iteration I’ll freeze to that, which will help, but that’s not out yet.

Processing on this system is a hybrid of GPU and CPU.

What is this not?

Streamline is not meant for mobile streaming, it’s meant for streaming from a more professional production with dedicated cameras, microphones, and video switching equipment. Think more “Apple Product Launch” live streaming, not Periscope or Twitch. In fact it’s not designed currently to feed services like YouTube Live, Facebook live, etc, it is meant to be its own end to end platform. The video will work on web browsers on a computer, iOS, or Android.

It's also not a production ready tool. I wouldn't like, use this in production. It might explode. I make no promises that it wont explode on you. Go check out AWS Elemental, Momento, Wowza, Bitmovin, Haivision, etc if you want production ready stuff with support that is available.

What is included?

Hardware and OS specs for an encoder.
A script to build the software on the encoder.
Scripts that use FFmpeg to capture, process, and encode the video, and then create a web page and a player that can allow you to watch this live video.
An Amazon Web Services configuration directions for an origin web server and a CDN configuration.

Projects leveraged

Services leveraged

Project approach

Since this is meant primarily as a teaching tool, we're going to be "showing our work" a lot, and, favoring simple to understand implementations over absolute ease of use through automation.

There are many architectures that can be used to build an end to end live streaming system. We have chosen one that provides better quality, is easy to understand, and has low server side costs. The tradeoff here is that it does require a faster connection from the encoder to the internet and requires more powerful hardware on site.

Additionally, we are not a production tool like OBS or Wirecast, meaning no video switching or compositing is provided. This is designed for higher end productions where you would use dedicated hardware for that purpose, like an ATEM switcher from Blackmagic.

We have not tried to address all possible hardware or trying to co-exist with a multi function system. We are targeting specific hardware with a dedicated software stack, giving us the most control for reliability as the project matures.

Parts of the system

signal flow

Source - Camera with HDMI or HD-SDI output. (it could be a video switcher or a matrix router in a larger production.) We will be taking in raw video from a camera or some video production system. It will be a raw, uncompressed, full quality video source. HDMI and HD-SDI are two different standards for how you can connect raw video between devices. Our goal here is to take this source and compress it and deliver it across the internet, yet, try to make it look as close as possible to this original “golden master” source.

Encoder - A desktop PC with an NVIDIA Quadro GPU and Blackmagic Capture Card. The encoder is running FFMPEG which will take the raw video and audio from the capture card, and compress it. It will then multiplex it into mpeg-ts, and send it via HTTP PUT to a web server, as well as send up a manifest. It will continue to send new manifests as they are needed, and it will send HTTP DELETE requests on the old segments. I’ll provide a parts list for this, but, you can vary the hardware to some extend, such as building it as a rack mount server.

Origin server - This will be the web server that will host the segments of video that come in, so that clients can pull them now. We will be using the software “Caddy Server” for this use case. It will receive HTTP PUTs of segments from the encoder.

CDN - Between the clients and the origin server, will be a network of caching servers all over the world, called edge servers, to scale out how many people can watch at once and improve their viewing experience. We will use Amazon’s CloudFront CDN for this.

Web Page - The web page that people hit which contains the video

Player - This is the software running on the web page which pulls down the manifest repeatedly, decides which bitrates to play, then requests the segments and shoves them into the media source API of the web page for decoding and display. We will be using HLS.js for this.

How does streaming work?

Lets start off by talking about how live streaming works. As a disclaimer, I’m going to be explaining HLS. HLS is a very common live streaming protocol, which allows for high quality and adaptive bitrate, at scale, on publicly available content delivery networks. Is HLS the best protocol? No, it has issues like very high latency, often around 15 seconds, and that it is controlled by Apple. However, it is common, simple to understand, there are many tools for it, and with the right approach can work pretty much everywhere. There are many ways to implement HLS, and I’m going to describe a very basic one. I can’t cover every rule of HLS here, so if you want to know more check out the HLS spec on Apple’s developer website. I’ll cover things like live and ABR, but I will skip things like DVR, or going live to VOD, etc. If want to know more about the lower level details of how video works, check out this link.

While we would imagine a more complex production typically, the simplest source to imagine is a camera with HDMI or SDI output. You would plug this camera into your “encoder” which will take the raw uncompressed video and audio, compress each source, package it into segments, and send it to a server hosted in AWS EC2. Each segment is typically one second to ten seconds long, depending on your goals and use cases. Contained within each segment is a stream of H.264 compressed video data, and AAC compressed audio data. These two streams are multiplexed together with a “container” which in this use case is mpeg-ts (transport stream).

When a segment is created, it is added to a manifest. A manifest in HLS is a .m3u8 file. Its a text file which contains the URLs for all of the segments. It would look something like “playlist.m3u8” and if you were to open it with a text editor, it’s fully human readable.

If you have only one segment (because you just started) your manifest would look like...

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:2.000000,
1435_10.ts

If you have three segments it would look like...

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:2.000000,
1435_10.ts
#EXTINF:2.000000,
1435_11.ts
#EXTINF:2.000000,
1435_12.ts

Eventually you will reach the max number of segments in your manifest (which is configurable). At that point you would start removing old segments while you add new ones. Thus, you have a rolling window. It’s like a caterpillar track, as you lay down new segments you pick up old ones by deleting them. Note that segment 0 is now gone.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:2
#EXT-X-MEDIA-SEQUENCE:1
#EXTINF:2.000000,
1435_11.ts
#EXTINF:2.000000,
1435_12.ts
#EXTINF:2.000000,
1435_13.ts
#EXTINF:2.000000,
1435_14.ts
#EXTINF:2.000000,
1435_15.ts

On the client, there will be what's called a "player." This is software that reads the manifest, and will keep pulling it down regularly looking for new segments. As it sees new segments, it will pull them down too and feed them into the player one after another, like a playlist of songs. The player will seamlessly go from one to the next as they come down. It will also make decisions such as what bitrate to select so that it does not rebuffer.

One important part of streaming, is bitrate adaptation. Some people will be on a cell phone, some will be on a fast wifi connection at home on a laptop. It’s important to provide the best quality experience given the variations in the speeds of different connections. To that end, we create multiple streams at different bitrates simultaneously, and a “variant playlist” which is a manifest that lists the location of the manifests for each bitrate stream. If you provide this variant playlist to a player, it can then decide which is the maximum quality for it’s connection in real time and adapt to that.

If you were to pull down a variant playlist like...

curl http://ec2-54-183-60-162.us-west-1.compute.amazonaws.com/5906.m3u8

You would get...

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="gro

Streamline

Install / Use

README

Streamline

What is this not?

What is included?

Projects leveraged

Services leveraged

Project approach

Parts of the system

How does streaming work?