Quick start tutorial¶
This quick start tutorial walks through all steps required to make DeepSS work with your data, using a recording of fly song as an example. A comprehensive documentation of all menus and options can be found in the GUI documentation.
In the tutorial, we will train DeepSS using an iterative and adaptive protocol that allows to quickly create a large dataset of annotations: Annotate a few song events, fast-train a network on those annotations, and then use that network to predict new annotations on a larger part of the recording. Initial, these predictions require manually correction, but correcting is typically much faster than annotating everything from scratch. This correct-train-predict cycle is then repeated with ever larger datasets until network performance is satisfactory.
Download example data¶
To follow the tutorial, download and open this audio file. The recording is of a Drosophila melanogaster male courting a female, recorded by David Stern (Janelia, part of this dataset). We will walk through loading, annotating, training and predicting using this file as an example.
Start the GUI¶
Install DeepSS following these instructions. Then start the GUI by opening a terminal, activating the conda environment created during install and typing
conda activate dss dss gui
The following window should open:
Load audio data¶
Choose Load audio from file and select the downloaded recording of fly song.
In the dialog that opens, leave everything as is except set Minimal/Maximal spectrogram frequency—the range of frequencies in the spectrogram display—to 50 and 1000 Hz. This will restrict the spectrogram view to only show the frequencies found in fly song.
Waveform and spectrogram display¶
Laoding the audio will open a window that displays the first second of audio as a waveform (top) and a spectrogram (bottom). You will see the two major modes of fly song—pulse and sine. The recording starts with sine song—a relatively soft oscillation resulting in a spectral power at ~150Hz. Pulse song starts after ~0.75 seconds, evident as trains of brief wavelets with a regular interval.
To navigate the view: Move forward/backward along the time axis via the
D keys and zoom in/out the time axis with the
S keys (see also the Playback menu). The temporal and frequency resolution of the spectrogram can be adjusted with the
You can play back the waveform on display through your headphones/speakers by pressing
Initialize or edit song types¶
Before you can annotate song, you need to register the sine and pulse song types for annotation. DeepSS discriminates two principal categories of song types:
Events are defined by a single time of occurrence. The aforementioned pulse song is a song type of the event category.
Segments are song types that extend over time and are defined by a start and a stop time. The aforementioned sine song and the syllables of mouse and bird vocalizations fall into the segment category.
Add two new song types for annotation via Annotations/Add or edit song types: ‘pulse’ of category ‘event’ and ‘sine’ of category ‘segment’:
Create annotations manually¶
The two new song types “pulse” or “sine” can now be activated for annotation using the dropdown menu on the top left of the main window. The active song type can also be changed with number keys indicated in the dropdown menu—in this case
1 activates pulse,
2 activates sine.
Song is annotated by left-clicking the waveform or spectrogram view. If an event-like song type is active, a single left click marks the time of an event. A segment-like song type requires two clicks—one for each boundary of the segment.
Annotate by thresholding the waveform¶
Annotation of events can be sped up with a “Thresholding mode”, which detects peaks in the sound energy exceeding a threshold. Activate thresholding mode via the Annotations menu. This will display a draggable horizontal line - the detection threshold - and a smooth pink waveform - the energy envelope of the waveform. Adjust the threshold so that only “correct” peaks in the envelope cross the threshold and then press
I to annotate these peaks as events.
In case you mis-clicked, you can edit and delete annotations. Edit event times and segment bounds by dragging the lines or the boundaries of segments. Drag the shaded area itself to move a segment without changing its duration. Movement can be disabled completely or restricted to the currently selected annotation type via the Audio menu.
Delete annotations of the active song type by right-clicking on the annotation. Annotations of all song types or only the active one in the view can be deleted with
Y, respectively, or via the Annotations menu.
Export annotations and make a dataset¶
DeepSS achieves good performance with little manual annotation. Once you have completely annotated the song in the first 18 seconds of the tutorial recording—a couple of pulse trains and sine song segments—you can train a network to help with annotating the rest of the data.
Trainining requires the audio data and the annotations to be in a specific dataset format. First, export the audio data and the annotations via
File/Export for DeepSS to a new folder (not the one containing the original audio)—let’s call the folder
quickstart. In the following dialog set start seconds and end seconds to the annotated time range - 0 and 18 seconds, respectively.
Then make a dataset, via DeepSS/Make dataset for training. In the file dialog, select the
quickstart folder you exported your annotations into. In the next dialog, we will adjust how data is split into training, validation and testing data. For the small data set annotated in the first step of this tutorial, we will not test the model. To maximize the data available for optimizing the network (training and validation), set the test split to 0.0 (not test) and the validation split to 40:
This will create a dataset folder called
quickstart.npy that contains the audio data and the annotations read for training.
Configure a network and start training via DeepSS/Train. This will ask you select the dataset folder,
quickstart.npy. Then, a dialog allows you to configure the network. For the fast training change the following:
Number of filtersand
Filter duration (seconds)to 16. This will result in a smaller network with fewer parameters, which will be faster to train and requires fewer annotations to achieve adequate performance.
Number of epochsto 10, to finish training earlier.
Start training in GUI - this will start training in a background process. Monitor training progress in the terminal. Training with this small dataset will finish within fewer than 10 minutes on a CPU and within 2 minutes on a GPU. For larger datasets, we highly recommend training on a machine with a discrete Nvidia GPU.
Once training finished, generate annotations using the trained network via DeepSS/Predict. This will ask you to select a model file containing the trained. Training creates files in the
quickstart.res folder, starting with the time stamp of training—select the file ending in
In the next dialog, predict song for 60 seconds starting after your manual annotations:
Start secondsto 18 and
End secondsto 78.
Make sure that
Proof reading modeis enabled. That way, annotations created by the network will be assigned names ending in
_proposals- in our case
pulse_proposals. The proposals will be transformed into proper
pulseannotations during proof reading.
Fill gaps shorter than (seconds)and
Delete segments shorter than (seconds)by unchecking both check boxes.
In contrast to training, prediction is very fast, and does not require a GPU—should finish within 30 seconds. The proposed annotations should be already good — most pulses should be correctly detected. Sine song is harder to predict and will likely be often missed or chopped up into multiple segments with gaps in between.
To turn the proposals into proper annotations, fix and approve them. Correct any prediction errors—add missing annotations, remove false positive annotations, adjust the timing of annotations. Once you have corrected all errors in the view, approve annotations with
H for approving only the active or all song types, respectively. This will rename the proposals in the view to the original names (for instance,
Go back to “Export”¶
Once all proposals have been approved, export all annotations (now between 0 and 78 seconds), make a new dataset, train, predict, and repeat. If prediction performance is adequate, fully train the network, this time using a completely new recording as the test set (TODO: add option to specify a file as the test set to the “Make dataset” dialog) and with a larger number of epochs.