SkillAgentSearch skills...

Mp4ff

Library and tools for working with MP4 files containing video, audio, subtitles, or metadata. The focus is on fragmented files. Includes mp4ff-info, mp4ff-encrypt, mp4ff-decrypt and other tools.

Install / Use

/learn @Eyevinn/Mp4ff

README

Logo

Test Coverage Status GoDoc Go Report Card Mentioned in Awesome Go license Badge OSC

Module mp4ff implements MP4 media file parsing and writing for AVC and HEVC video, AAC and AC-3 audio, stpp and wvtt subtitles, and timed metadata tracks. It is focused on fragmented files as used for streaming in MPEG-DASH, MSS and HLS fMP4, but can also decode and encode all boxes needed for progressive MP4 files.

Command Line Tools

Some useful command line tools are available in cmd directory.

  1. mp4ff-info prints a tree of the box hierarchy of a mp4 file with information about the boxes.
  2. mp4ff-pslister extracts and displays SPS and PPS for AVC or HEVC in a mp4 or a bytestream (Annex B) file. Partial information is printed for HEVC.
  3. mp4ff-nallister lists NALUs and picture types for video in progressive or fragmented file
  4. mp4ff-subslister lists details of wvtt or stpp (WebVTT or TTML in ISOBMFF) subtitle samples
  5. mp4ff-crop crops a progressive mp4 file to a specified duration
  6. mp4ff-encrypt encrypts a fragmented file using cenc or cbcs Common Encryption scheme
  7. mp4ff-decrypt decrypts a fragmented file encrypted using cenc or cbcs Common Encryption scheme

You can install these tools by going to their respective directory and run go install . or directly from the repo with

go install github.com/Eyevinn/mp4ff/cmd/mp4ff-info@latest
go install github.com/Eyevinn/mp4ff/cmd/mp4ff-encrypt@latest
...

for each individual tool.

Codec support

This repo is focused on the file format, but goes beyond the base file format and supports codec-specific boxes. The codecs and their boxes are

| Type| Codec | Sample Entry | Config Box | Other Boxes | | ----- | ----| ---- | ---- | ---- | | Video | AVC/H.264 | avc1, avc3 | avcC | btrt, pasp, colr | | Video | HEVC/H.265 | hvc1, hev1 | hvcC | btrt, pasp, colr | | Video | AV1 | av01 | av1C | btrt, pasp, colr | | Video | AVS3 | avs3 | av3c | btrt, pasp, colr | | Video | VP8/VP9 | vp08, vp09 | vpcC | btrt, pasp, colr | | Video | VVC/H.266 | vvc1, vvi1 | vvcC | btrt, pasp, colr | | Video | Encrypted | encv | sinf | btrt | | Audio | AAC | mp4a | esds | btrt | | Audio | AC-3 | ac-3 | dac3 | btrt | | Audio | E-AC-3 | ec-3 | dec3 | btrt | | Audio | AC-4 | ac-4 | dac4 | btrt | | Audio | Opus | Opus | dOps | btrt | | Audio | MPEG-H 3D Audio | mha1, mha2, mhm1, mhm2 | mhaC | btrt | | Audio | Encrypted | enca | sinf | btrt | | Subtitles | WebVTT | wvtt | vttC, vlab | vttc, vtte, vtta, vsid, ctim, iden, sttg, payl, btrt | | Subtitles | TTML | stpp | - | btrt | | Subtitles | Generic | evte | - | btrt |

Open Source Cloud

You can also run the tools as a job in Eyevinn Open Source Cloud. Here is an example using the mp4ff-crop command and the Open Source Cloud CLI.

% export OSC_ACCESS_TOKEN=<your-personal-access-token>
% npx -y @osaas/cli@latest create eyevinn-mp4ff test \
  -o awsAccessKeyId=<s3-access-key-id> \
  -o awsSecretAccessKey=<s3-secret-key> \
  -o s3EndpointUrl=https://eyevinnlab-birme.minio-minio.auto.prod.osaas.io \
  -o cmdLineArgs="mp4ff-crop s3://input/VINN.mp4 s3://output/VINN-crop2.mp4"

The file VINN.mp4 on the bucket called "input" on the MinIO server at https://eyevinnlab-birme.minio-minio.auto.prod.osaas.io is processed and output uploaded to bucket "output" on the same MinIO server.

Example code

Example code for some common use cases is available in the examples directory. The examples and their functions are:

  1. initcreator creates typical init segments (ftyp + moov) for different video and audio codecs
  2. resegmenter reads a segmented file (CMAF track) and resegments it with other segment durations using FullSample
  3. segmenter takes a progressive mp4 file and creates init and media segments from it. This tool has been extended to support generation of segments with multiple tracks as well as reading and writing mdat in lazy mode
  4. multitrack parses a fragmented file with multiple tracks
  5. combine-segs combines single-track init and media segments into multi-track segments
  6. add-sidx adds a top-level sidx box describing the segments of a fragmented files.

Packages

The top-level packages in the mp4ff module are

  1. mp4 provides support for for parsing (called Decode) and writing (Encode) a plethor of mp4 boxes. It also contains helper functions for extracting, encrypting, dectrypting samples and a lot more.
  2. avc deals with AVC (aka H.264) video in the mp4ff/avc package including parsing of SPS and PPS, and finding start-codes in Annex B byte streams.
  3. hevc provides structures and functions for dealing with HEVC video and its packaging.
  4. vvc provides structures and functions for dealing with VVC video and its packaging.
  5. sei provides support for handling Supplementary Enhancement Information (SEI) such as timestamps for AVC and HEVC video.
  6. av1 provides basic support for AV1 video packaging
  7. aac provides support for AAC audio. This includes handling ADTS headers which is common for AAC inside MPEG-2 TS streams.
  8. bits provides bit-wise and byte-wise readers and writers used by the other packages.

Structure and usage

mp4.File and its composition

The top level structure for both non-fragmented and fragmented mp4 files is mp4.File.

In a progressive (non-fragmented) mp4.File, the top-level attributes Ftyp, Moov, and Mdat point to the corresponding boxes.

A fragmented mp4.File can be more or less complete, like a single init segment, one or more media segments, or a combination of both, like a CMAF track which renders into a playable one-track asset. It can also have multiple tracks. For fragmented files, the following high-level attributes are used:

  • Init contains a ftyp and a moov box and provides the general metadata for a fragmented file. It corresponds to a CMAF header. It can also contain one or more sidx boxes.
  • Segments is a slice of MediaSegment which start with an optional styp box, possibly one or more sidx boxes and then one or moreFragments.
  • Fragment is a mp4 fragment with exactly one moof box followed by a mdat box where the latter contains the media data. It can have one or more trun boxes containing the metadata for the samples. The fragment can start with one or more emsg boxes.

It should be noted that it is sometimes hard to decide what should belong to a Segment or Fragment.

All child boxes of container boxes such as MoovBox are listed in the Children attribute, but the most prominent child boxes have direct links with names which makes it possible to write a path such as

fragment.Moof.Traf.Trun

to access the (only) trun box in a fragment with only one traf box, or

fragment.Moof.Trafs[1].Trun[1]

to get the second trun of the second traf box (provided that they exist). Care must be taken to assert that none of the intermediate pointers are nil to avoid panic.

Creating new fragmented files

A typical use case is to generate a fragmented file consisting of an init segment followed by a series of media segments.

The first step is to create the init segment. This is done in three steps as can be seen in examples/initcreator:

init := mp4.CreateEmptyInit()
init.AddEmptyTrack(timescale, mediatype, language)
init.Moov.Trak.SetHEVCDescriptor("hvc1", vpsNALUs, spsNALUs, ppsNALUs)

Here the third step fills in codec-specific parameters into the sample descriptor of the single track. Multiple tracks are also available via the slice attribute Traks instead of Trak.

The second step is to start producing media segments. They should use the timescale that was set when creating the init segment. Generally, that timescale should be chosen so that the sample durations have exact values without rounding errors, e.g. 48000 for 48kHz audio.

A media segment contains one or more fragments, where each fragment has a moof and a mdat box. If all samples are available before the segment is created, one can use a single fragment in each segment. Example code for this can be found in examples/segmenter. For low-latency MPEG-DASH generation, short-du

Related Skills

View on GitHub
GitHub Stars624
CategoryContent
Updated3d ago
Forks116

Languages

Go

Security Score

100/100

Audited on Mar 28, 2026

No findings