Convert your own annotations and audio data¶
If you start from scratch—with non-annotated audio recording—use the GUI. See the GUI tutorial for a description of all steps - from loading data, annotating song, making a dataset, training a network and generating annotations after training…
However, often annotations exist, from old manual annotations or produced by other tools. DeepSS can be used with existing annotations, by converting the existing annotations into the dss format.
If audio data is in a format supported by dss (see here), open in GUI and export to a folder. For processing large sets of recordings use the notebook.
Format of exported annotations and audio¶
Produced by the GUI via File/Save annotations
and File/Export for DeepSS
.
Audio and annotations are exported into csv
(comma-separated values) and npz
(zip compressed numpy files):
npz
consist of two variables:data
:[samples, channels]
array with the audio datasamplerate
:[1,]
array with the sample rate in Hz
csv
contains three columns:name
- the name of the song or syllable typestart_seconds
- the start time of the syllable.stop_seconds
- the stop of the syllable. Start and stop are identical for song types of type event, like the pulses of fly song.Each row in the file contains to a single annotation with
name
,start_seconds
andstop_seconds
. Special rows a reserved for song types without any annotations: For syllables or other segment types, the consist of the name,start_seconds
isnp.nan
and an arbitrary stop_seconds. For event-like types (song pulses), bothstart_seconds
andstop_seconds
arenp.nan
.
The csv
format is universal and can be created and edit using Excel or even a plain text editor. It is also very easy to programmatically created csv
files from your own annotation format in python using a pandas DataFrames. Below, we show an example of creating a DataFrame in the correct format from annotation data and saving it as a csv
file for use with DeepSS.
Examples of transforming custom annotation formats¶
Say we have read annotation data into python as three lists containing the names, start and stop of a song type:
names = ['bip', 'bop', 'bip']
start_seconds = [1.34, 5.67, 9.13]
stop_seconds = [1.34, 5.85, 9.13]
This defines two song types, “bip” and “bop”. “bip” an event-like song type (like a pulse in fly song), since start and stop are identical. “bop” is a segment-like song type (like a syllable in birdsong), because start and stop differ.
This information needs to be arranged into a table, i.e., a pandas DataFrame, of this format:
names |
start_seconds |
stop_seconds |
---|---|---|
bip |
1.34 |
1.34 |
bop |
5.67 |
5.85 |
bip |
9.13 |
9.13 |
A pandas DataFrame with the format can be created by two means:
Use xb.annot
, which takes the three lists and produces a correctly DataFrame
from xarray_behave.annot import Events # require install with gui
evt = Events.from_lists(names, start_seconds, stop_seconds)
df = evt.to_df()
Or you can assemble the DataFrame yourself:
import numpy as np
import pandas as pd
# create empty DataFrame with the required columns
df = pd.DataFrame(columns=['name', 'start_seconds', 'stop_seconds'])
# append a segment
onset = 1.33 # seconds
offset = 1.42 # seconds
segment_bounds = [onset, offset]
segment_name = 'sine_song'
new_row = pd.DataFrame(np.array([segment_name, *segment_bounds])[np.newaxis,:],
columns=df.columns)
df = df.append(new_row, ignore_index=True)
# append an event
event_time = 2.15 # seconds
event_name = 'pulse'
new_row = pd.DataFrame(np.array([event_name, event_time, event_time])[np.newaxis,:],
columns=df.columns)
df = df.append(new_row, ignore_index=True)
Then save as a csv
file:
df.to_csv('filename.csv')
Convert audio data¶
The GUI can read many formats (see list of supported audio formats) and data can always be exported in the correct format via the GUI. However, if you want to assemble many of your own recordings into dataset for training, a programmatic approach is more efficient.
To assemble a dataset, audio data has to be provided in two formats:
wav
: Universal format for audio data. can be created from many software packages:From python
scipy.io.wavfile.write(...)
from matlab
wavwrite(...)
from the command line via ffmpeg:
ffmpeg ...
npz
: Python-specific but a bit more flexible/robust. Should contain two variables -samplerate
anddata
- and can be created like so:np.savez(filename, data=audio, samplerate=samplerate)
Warning
Clipping can occur when saving certain data types as wav files. see docs of scipy.io.wavfile.write for a list of the range of values available when saving audio of different types to wav.