Implementing Artificial Intelligence which creates music

Hello World!! A few days back I watched a TED talk by Pierre Barreau named "How AI could compose a personalized soundtrack to your life" in which he talks about AIVA, artificial intelligence that has been trained in the art of music composition. After watching that video I was curious how an AI can compose new music. So I decided to give this a shot and try to build my own AI that can almost do the same. In the ted talk, Pierre Barreau says that AIVA uses some of the notes as input and then generates the rest of the composition. 

Equilizer - Music generator

Dataset and Requirements

So my first task was to find a dataset through which we can use for this purpose. Some google searches and I found a dataset of midi files. The dataset had 92 files. If you have the computation power you can use The Lakh Midi dataset which has 176,581 midi files. Now that we have the dataset the next task was to convert it to numbers so that we can train it. A couple more google searches and I found a python package named Music21 which can aid us for this task. Other requirements are:

  • Numpy, Glob
  • Tensorflow and Keras
  • Music21


I decided to use 100 notes and then generate 1 note as output. Initially, we will give 100 notes (0-99) as input then our model will generate the next 1 note, then we take the previous 100 notes as input (1-100) to generate the next note, then (2-101) note to generate next and so on. I hope this explains the process. So I divided the whole mini-project into 3 subparts which are:

  1. Convert MIDI files to Numpy arrays
  2. Train Model (and save checkpoints)
  3. Generate awesome music

Convert MIDI files to Numpy array

I had a relatively small dataset of 92 files so I can afford to convert this dataset to a CSV file and then load our data from the CSV file itself as they are easy to handle. Alternatively, you can try to get the input from MIDI files directly (Let me know if anyone of you tries this...). So let us start by loading the packages and functions.

import glob
from music21 import converter, instrument, note, chord
import numpy as np
from keras.utils import np_utils

Now let's convert our midi files to a numpy array.

notes = []

for file in glob.glob("midi_songs/*.mid"):
    midi = converter.parse(file)
    notes_to_parse = midi.flat.notes
    for elements in notes_to_parse:
        if isinstance(elements, note.Note):
        elif isinstance(elements, chord.Chord):
            notes.append('.'.join(str(n) for n in elements.normalOrder))
inotes = [x for x in sorted(set(notes))]

We converted our midi files to a numpy array and added <eof> at the end of each file. This <eof> will be helpful when we convert our data to a sequence of inputs and outputs. Moving on let us create our CSV files which we can use later or you can also use these numpy arrays directly for the model.

seq_len = 100
network_input = []
network_output = []

flag = True
nexteof = -1
while flag:
    startnext = nexteof+1
    nexteof = notes.index("<eof>",startnext)
    if nexteof == len(notes)-1:
        flag = False
    for i in range(startnext, nexteof-seq_len):
        sequence_in = notes[i:i+seq_len]
        sequence_out = notes[i+seq_len]
        network_input.append([note_to_int[x] for x in sequence_in])
n_patterns = len(network_input)
network_input = np.reshape(network_input, (n_patterns,seq_len)) 
network_input = network_input/float(n_vocab)
network_output = np_utils.to_categorical(network_output)

So first got the locations of <eof> tags in the array converted these to a sequence of 100 notes and appended that to network_input and next note to network_output. Lastly, we converted network_output to categorical values.


This is an optional step that saves these arrays as text. Let's move to step 2.

Train Model (and save checkpoints)

Let start by uploading the packages needed to train our model.

import glob
from music21 import converter, instrument, note, chord
import numpy as np
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, LSTM
from keras.callbacks import ModelCheckpoint, Callback

The next step is to load the CSV files we made or you can just use the previous numpy arrays we made.

network_input = np.loadtxt("network_input.csv",delimiter=',')
network_output = np.loadtxt("network_output.csv",delimiter=',')

n_patterns = len(network_input)
seq_len = network_input.shape[1]
n_vocab = 359
network_input = np.reshape(network_input, (n_patterns, seq_len,1))

Now build the model to train our neural network.

model = Sequential()
        input_shape=(network_input.shape[1], network_input.shape[2]),
model.add(LSTM(512, return_sequences=True))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy','mse', 'mae', 'mape', 'cosine'])

Now that we have made our model we can start training on the data we have. We have also made the checkpoints to save the model.

filepath = "weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"    

checkpoint = ModelCheckpoint(
    filepath, monitor='loss', 
callbacks_list = [checkpoint], network_output, epochs=20, batch_size=1024, callbacks=callbacks_list)

So we did the training and made our model to output the next node. Let's move on the last and final step which is to use the model to create awesome music using our AI.

Generate awesome music

Let start so at first we are gonna create our prediction array and then we will convert that to a music (MIDI) file. Let's start.

start = np.random.randint(0, len(network_input)-1)

pattern = network_input[start].tolist()
prediction_output = []

for note_index in range(500):
    prediction_input = np.reshape(np.array(pattern), (1, len(pattern), 1))

    prediction = model.predict(prediction_input, verbose=0)    
    index = np.argmax(prediction)
    result = int_to_note[index]

    pattern = pattern[1:]
    print("notes are ",result, index)

We selected a random input and then generated an output. Now let's convert this to a music file (MIDI).

offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
    # pattern is a note
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

This will create a file named test_output.mid


In this post, we saw how we can build our own little awesome music composer that creates new music every time. This can also be done using GAN's (probably in the future I will show you how). If you liked this post please leave a comment and let me know if you guys have any interesting ideas or projects related to these.