The NES Music Database

Dataset information
nesmdb Python package
- Installation
- Render audio
NES synthesizer primer
Attribution

The Nintendo Entertainment System Music Database (NES-MDB) is a dataset intended for building automatic music composition systems for the NES audio synthesizer (paper). The NES synthesizer has highly constrained compositional parameters which are well-suited to a wide variety of current machine learning techniques. The synthesizer is typically programmed in assembly, but we parse the assembly into straightforward formats that are more suitable for machine learning.

This repository contains dataset information as well as nesmdb, a Python package that can be used to render your generated music through the NES synthesizer. You only need to install the nesmdb package if you want to listen to your generated results. Otherwise, you may simply download an appropriate format of the dataset.

Dataset information

The NES-MDB dataset consists of 5278 songs from the soundtracks of 397 NES games. The dataset represents 296 unique composers, and the songs contain more than two million notes combined. We build NES-MDB starting from the assembly code of NES games, which contain the exact timings and parameter values necessary for accurate chiptune renditions. We split the dataset into training, validation, and testing splits, ensuring that no composer appears in multiple splits.

The NES synthesizer has five instrument voices: two pulse-wave generators (P1, P2), a triangle-wave generator (TR), a percussive noise generator (NO), and an audio sample playback channel (excluded for simplicity). Each voice is programmed by modifying four 8-bit registers which update the audio synthesis state. With NES-MDB, our goal is to allow researchers to study NES music while shielding them from the inner workings of an archaic audio synthesis chip. Hence, we offer the dataset in several convenient formats, and provide details for those who wish to dig deeper.

If your background is in algorithmic composition, we recommend using either the MIDI or score formats. If you are more familiar with language modeling, we recommend the NES Language Modeling (NLM) format.

Download links

Hover download links for SHA256 checksums.

( 12 MB) Download NES-MDB in MIDI Format
( 11 MB) Download NES-MDB in Expressive Score Format
( 04 MB) Download NES-MDB in Separated Score Format
( 41 MB) Download NES-MDB in Blended Score Format
(155 MB) Download NES-MDB in Language Modeling Format
( 31 MB) Download NES-MDB in Raw VGM Format
( 66 KB) Download NES-MDB Composer Metadata

MIDI format

The MIDI file format stores discrete musical events that describe a composition. MIDI files in NES-MDB consist of note/velocity/timbre events with 44.1 kHz timing resolution, allowing for sample-accurate reconstruction by an NES synthesizer.

Each MIDI file consists of four instrument voices: P1, P2, TR, and NO. Each voice contains a timestamped list of MIDI note events. All voices except for TR contain additional timestamped lists of MIDI control change events representing velocity (CC11) and timbre (CC12) information.

Click here for an IPython notebook exploring the MIDI version of NES-MDB

Example source code for loading an NES-MDB MIDI with pretty_midi:

import pretty_midi

midi_data = pretty_midi.PrettyMIDI('train/297_SkyKid_00_01StartMusicBGMIntroBGM.mid')

# If loading MIDI file fails, try
# pretty_midi.pretty_midi.MAX_TICK = 1e10

for instrument in midi_data.instruments:
  print('-' * 80)
  print(instrument.name.upper())
  print('# note events: {}'.format(len(instrument.notes)))
  print('# control change events: {}'.format(len(instrument.control_changes)))

Score formats

The three score formats are piano roll representations. Unlike the sparse (event-based) MIDI format, the score formats are dense and sampled at a fixed rate of 24 Hz to make them more compact, and are thus lossy.

Expressive score

<img src="static/score_expressive.png" width="400"/> <img src="static/score_dimensionality.png" width="400"/> Depiction and dimensionality of the expressive score format (all values are discrete).

The expressive score format contains all of the information that the NES synthesizer needs to render the music at each timestep. Each song is represented as an numpy.uint8 array of size Nx4x3, where N is the number of timesteps at 24 Hz. There are 4 synthesis voices, and each has a state of 3 bytes per timestep consisting of note, velocity and timbral information. The above table displays the possible values for each of the instrument voices. The triangle voice only uses note information and will always take value 0 for velocity and timbre.

Example source code for loading an NES-MDB expressive score:

import pickle

with open('train/297_SkyKid_00_01StartMusicBGMIntroBGM.exprsco.pkl', 'rb') as f:
  rate, nsamps, exprsco = pickle.load(f)

print('Temporal discretization rate: {}'.format(rate)) # Will be 24.0
print('Length of original VGM: {}'.format(nsamps / 44100.))
print('Piano roll shape: {}'.format(exprsco.shape))

Separated score

<img src="static/score_separated.png" width="400"/> Depiction of the separated score format

The separated score format is the same as the expressive score format except it only contains note information, and thus each song is a numpy.uint8 array of size Nx4. This format is convenient if you wish to only model the notes/timing of the music and not expressive performance characteristics.

import pickle

with open('train/297_SkyKid_00_01StartMusicBGMIntroBGM.seprsco.pkl', 'rb') as f:
  rate, nsamps, seprsco = pickle.load(f)

print('Temporal discretization rate: {}'.format(rate)) # Will be 24.0
print('Length of original VGM: {}'.format(nsamps / 44100.))
print('Piano roll shape: {}'.format(seprsco.shape))

Blended score

<img src="static/score_blended.png" width="400"/> Depiction of the blended score format

The blended score format is a degenerate representation where the three melodic voices of the NES synthesizer are flattened into "chords". Each song is represented as a list-of-lists consisting of all the sounding notes at each timestep: e.g. [[60, 64, 67], [60], [], [62, 69]]. We offer this format for compatibility with four other canonical datasets often studied in polyphonic music composition (Boulanger-Lewandowski et al. 2012). As this format loses instrument voice information, we do not recommend studying it for the purposes of novel NES music generation.

import pickle

with open('train/297_SkyKid_00_01StartMusicBGMIntroBGM.blndsco.pkl', 'rb') as f:
  rate, nsamps, blndsco = pickle.load(f)

print('Temporal discretization rate: {}'.format(rate)) # Will be 24.0
print('Length of original VGM: {}'.format(nsamps / 44100.))
print('Piano roll length: {}'.format(len(blndsco)))

Language modeling format

The NES language modeling (NLM) format is a timestamped list of instructions controlling the synthesizer state machine. Such a list format has less musical structure than MIDI or scores as instructions controlling the four instrument voices are entangled. However, it might be possible to train powerful sequential models to learn the semantics of this format. Here is an annotated example:

clock,1789773  # NES system clock rate
fc_mo,0        # Set frame counter to 4-step mode
ch_no,1        # Turn on noise channel
ch_tr,1        # Turn on triangle channel
ch_p2,1        # Turn on Pulse 2 channel
ch_p1,1        # Turn on Pulse 1 channel
w,13           # Wait 13 audio samples (13/44100 seconds)
tr_lr,66       # Set the tr

Nesmdb

Install / Use

README