Data formats¶
Format of exported annotations and audio¶
Produced by the GUI via File/Save annotations
and File/Export for DeepSS
.
Annotations are exported to csv
(comma-separated values) and audio data to npz
(zip compressed numpy files) or wav
(wave audio file):
.npz
consist of two variables:data
:[samples, channels]
array with the audio datasamplerate
:[1,]
array with the sample rate in Hz
_annotations.csv
contains three columns:name
- the name of the song or syllable typestart_seconds
- the start time of the syllable.stop_seconds
- the stop of the syllable. Start and stop are identical for song types of type event, like the pulses of fly song.Each row in the file contains to a single annotation with
name
,start_seconds
andstop_seconds
. Special rows a reserved for song types without any annotations: For syllables or other segment types, the consist of the name,start_seconds
isnp.nan
and an arbitrary stop_seconds. For event-like types (song pulses), bothstart_seconds
andstop_seconds
arenp.nan
.
Data structure used for training¶
For training, DeepSS expects a simple dictionary-like data structure, sth like:
data
├── ['train']
│ ├── ['x'] (the audio data - samples x channels)
│ ├── ['y'] (annotations - samples x song types, first one is noise, needs to add to )
│ ├── ['y_suffix1'] (optional, multiple allowed)
├── ['val']
│ ├── ['x']
│ ├── ['y']
│ ├── ['y_suffix1']
├── ['test']
│ ├── ['x']
│ ├── ['y']
│ ├── ['y_suffix1']
└── attrs
└── ['samplerate'] (of x and y in Hz)
['class_names']
['class_types'] (event or segment)
['class_names_suffix1']
['class_types_suffix1'] (event or segment)
Top-level keys train
, val
, and test
correspond to the data splits for training, validation, and test. Training and validation data are required. Validation is used during training to monitor progress and adjust the learning rate or stop training early. Test data is optional - it is used after training to assess the performance of the model.
Each a top-level key contains a dict
with the following keys:
Inputs
x
:[samples, channels]
First dim is always samples, last is typically audio channels (>=1).
For spectrogram representations, could also be frequency channels
[samples, freqs]
for single-channel recordings. For multi-channel data time, the frequency channels from the spectrum of each audio channels can be stacked to[time, channels*freqs]
Targets
y
for each sample in ‘x’:[samples, nb_classes]
Binary (0/1) or a probability but should sum to 1.0 across classes for each sample
Multiple targets can be specified by adding target variables with a suffix
y_somesuffix
. That allows you to train the a network with the same x but different y’s (e.g. pulse or sine-only). Must named likey_somesuffix
, wheresomesuffix
can be an arbitrary string. The target can be specified during training with they_suffix=somesuffix
argument, which defaults to ‘’ (the standard target w/o suffixy
). Training y-suffixes also requires suffix-specific attributes specifying the name and type of the target -classnames_somesuffix
andclasstype_somesuffix
.
Metadata in dict
data.attrs
:samplerate_x_Hz
,samplerate_y_Hz
- should be equalsamplerate_song_Hz
(optional) - ofsong
, the original recording in casex
is the result of a wavelet or spectrogram transformclass name and type information for each target:
classnames
:[nb_classes: str]
- one for each 2nd dim in yclasstype
:[nb_classes: str]
(optional for training) -event
(e.g. pulse) orsegment
(sine or syllable)for each target specified via
y_somesuffix
, create aclassnames_suffix
andclassnames_somesuffix
attribute
Data is accessed via data['train']['x']
and the metadata via data.attrs
. attrs
is a standard attribute for storing metadata in hdf5
and zarr
files and can be easily attached to a standard dictionary: a = dict(); a.attrs = {'samplerate_x_Hz': 10_000}
.
This structure can be implemented via python’s builtin dictionary, hdf5, xarray, zarr, or anything else that implements a key-value interface (called a Mapping in python).
We provide a alternative storage backend, npy_dir
, that mirrors the data structure in directory hierarchy with numpy’s npy files (inspired by Cyrille Rossant’s series of blog posts (1, 2), jbof and exdir). Keys map to directories; values and attrs
map to npy
files (see dss.npy_dir
). For instance, data['train']['x']
is stored in dirname.npy/train/x.npy
. attrs
is stored in the top-level directory. Storing data as npy
files provides a fast memory-mapping mechanism for out-of-memory access if your data set does not fit in memory. While zarr, h5py, and xarray provide mechanisms for out-of-memory access, they tend to be slower in our experience or require fine tuning to reach the performance reached with memmapped npy files.
During training, data is loaded (see dss.io
) with the correct backend given by the extension of the file or directory. Currently supported files types are:
zarr storages (files or directories ending in
.zarr
),hdf5 files (files ending in
.h5
),directories ending in
.npy
with numpy files.
zarr
is great for assembling and storing datasets, in particular for datasets with unknown final size since it allows appending to arrays, and is used as an intermediate storage during dataset assembly. For training, npy
is the fastest, in particular for random access of datasets that are too large to fit in memory, thanks to highly efficient memory mapping.