Convert your own annotations and audio data¶
If you start from scratch—with non-annotated audio recording—use the GUI. See the GUI tutorial for a description of all steps - from loading data, annotating song, making a dataset, training a network and generating annotations after training…
However, often annotations exist, from old manual annotations or produced by other tools. DeepSS can be used with existing annotations, by converting the existing annotations into the dss format.
Format of exported annotations and audio¶
Produced by the GUI via
File/Save annotations and
File/Export for DeepSS.
Audio and annotations are exported into
csv (comma-separated values) and
npz (zip compressed numpy files):
npzconsist of two variables:
[samples, channels]array with the audio data
[1,]array with the sample rate in Hz
csvcontains three columns:
name- the name of the song or syllable type
start_seconds- the start time of the syllable.
stop_seconds- the stop of the syllable. Start and stop are identical for song types of type event, like the pulses of fly song.
Each row in the file contains to a single annotation with
stop_seconds. Special rows a reserved for song types without any annotations: For syllables or other segment types, the consist of the name,
np.nanand an arbitrary stop_seconds. For event-like types (song pulses), both
csv format is universal and can be created and edit using Excel or even a plain text editor. It is also very easy to programmatically created
csv files from your own annotation format in python using a pandas DataFrames. Below, we show an example of creating a DataFrame in the correct format from annotation data and saving it as a
csv file for use with DeepSS.
Examples of transforming custom annotation formats¶
Say we have read annotation data into python as three lists containing the names, start and stop of a song type:
names = ['bip', 'bop', 'bip'] start_seconds = [1.34, 5.67, 9.13] stop_seconds = [1.34, 5.85, 9.13]
This defines two song types, “bip” and “bop”. “bip” an event-like song type (like a pulse in fly song), since start and stop are identical. “bop” is a segment-like song type (like a syllable in birdsong), because start and stop differ.
This information needs to be arranged into a table, i.e., a pandas DataFrame, of this format:
A pandas DataFrame with the format can be created by two means:
xb.annot, which takes the three lists and produces a correctly DataFrame
from xarray_behave.annot import Events # require install with gui evt = Events.from_lists(names, start_seconds, stop_seconds) df = evt.to_df()
Or you can assemble the DataFrame yourself:
import numpy as np import pandas as pd # create empty DataFrame with the required columns df = pd.DataFrame(columns=['name', 'start_seconds', 'stop_seconds']) # append a segment onset = 1.33 # seconds offset = 1.42 # seconds segment_bounds = [onset, offset] segment_name = 'sine_song' new_row = pd.DataFrame(np.array([segment_name, *segment_bounds])[np.newaxis,:], columns=df.columns) df = df.append(new_row, ignore_index=True) # append an event event_time = 2.15 # seconds event_name = 'pulse' new_row = pd.DataFrame(np.array([event_name, event_time, event_time])[np.newaxis,:], columns=df.columns) df = df.append(new_row, ignore_index=True)
Then save as a
Convert audio data¶
The GUI can read many formats (see list of supported audio formats) and data can always be exported in the correct format via the GUI. However, if you want to assemble many of your own recordings into dataset for training, a programmatic approach is more efficient.
To assemble a dataset, audio data has to be provided in two formats:
wav: Universal format for audio data. can be created from many software packages:
from the command line via ffmpeg:
npz: Python-specific but a bit more flexible/robust. Should contain two variables -
data- and can be created like so:
np.savez(filename, data=audio, samplerate=samplerate)
Clipping can occur when saving certain data types as wav files. see docs of scipy.io.wavfile.write for a list of the range of values available when saving audio of different types to wav.