Convert your own annotations and audio data

If you start from scratch—with non-annotated audio recording—use the GUI. See the GUI tutorial for a description of all steps - from loading data, annotating song, making a dataset, training a network and generating annotations after training…

However, often annotations exist, from old manual annotations or produced by other tools. DeepSS can be used with existing annotations, by converting the existing annotations into the dss format.

If audio data is in a format supported by dss (see here), open in GUI and export to a folder. For processing large sets of recordings use the notebook.

Format of exported annotations and audio

Produced by the GUI via File/Save annotations and File/Export for DeepSS.

Audio and annotations are exported into csv (comma-separated values) and npz (zip compressed numpy files):

  • npz consist of two variables:

    • data: [samples, channels] array with the audio data

    • samplerate: [1,] array with the sample rate in Hz

  • csv contains three columns:

    • name - the name of the song or syllable type

    • start_seconds - the start time of the syllable.

    • stop_seconds - the stop of the syllable. Start and stop are identical for song types of type event, like the pulses of fly song.

    • Each row in the file contains to a single annotation with name, start_seconds and stop_seconds. Special rows a reserved for song types without any annotations: For syllables or other segment types, the consist of the name, start_seconds is np.nan and an arbitrary stop_seconds. For event-like types (song pulses), both start_seconds and stop_seconds are np.nan.

The csv format is universal and can be created and edit using Excel or even a plain text editor. It is also very easy to programmatically created csv files from your own annotation format in python using a pandas DataFrames. Below, we show an example of creating a DataFrame in the correct format from annotation data and saving it as a csv file for use with DeepSS.

Examples of transforming custom annotation formats

Say we have read annotation data into python as three lists containing the names, start and stop of a song type:

names = ['bip', 'bop', 'bip']
start_seconds = [1.34, 5.67, 9.13]
stop_seconds = [1.34, 5.85, 9.13]

This defines two song types, “bip” and “bop”. “bip” an event-like song type (like a pulse in fly song), since start and stop are identical. “bop” is a segment-like song type (like a syllable in birdsong), because start and stop differ.

This information needs to be arranged into a table, i.e., a pandas DataFrame, of this format:













A pandas DataFrame with the format can be created by two means: Use xb.annot, which takes the three lists and produces a correctly DataFrame

from xarray_behave.annot import Events  # require install with gui

evt = Events.from_lists(names, start_seconds, stop_seconds)
df = evt.to_df()

Or you can assemble the DataFrame yourself:

import numpy as np
import pandas as pd

# create empty DataFrame with the required columns
df = pd.DataFrame(columns=['name', 'start_seconds', 'stop_seconds'])

# append a segment
onset = 1.33 # seconds
offset = 1.42  # seconds
segment_bounds = [onset, offset]
segment_name = 'sine_song'

new_row = pd.DataFrame(np.array([segment_name, *segment_bounds])[np.newaxis,:],
df = df.append(new_row, ignore_index=True)

# append an event
event_time = 2.15 # seconds
event_name = 'pulse'

new_row = pd.DataFrame(np.array([event_name, event_time, event_time])[np.newaxis,:],
df = df.append(new_row, ignore_index=True)

Then save as a csv file:


Convert audio data

The GUI can read many formats (see list of supported audio formats) and data can always be exported in the correct format via the GUI. However, if you want to assemble many of your own recordings into dataset for training, a programmatic approach is more efficient.

To assemble a dataset, audio data has to be provided in two formats:

  • wav: Universal format for audio data. can be created from many software packages:

    • From python

    • from matlab wavwrite(...)

    • from the command line via ffmpeg: ffmpeg ...

  • npz: Python-specific but a bit more flexible/robust. Should contain two variables - samplerate and data - and can be created like so: np.savez(filename, data=audio, samplerate=samplerate)


Clipping can occur when saving certain data types as wav files. see docs of for a list of the range of values available when saving audio of different types to wav.