das.predict

Code for training and evaluating networks.

das.predict.cli_predict(recording_filename: str, model_save_name: str, *, save_filename: Optional[str] = None, verbose: int = 1, batch_size: Optional[int] = None, event_thres: float = 0.5, event_dist: float = 0.01, event_dist_min: float = 0, event_dist_max: Optional[float] = None, segment_thres: float = 0.5, segment_minlen: Optional[float] = None, segment_fillgap: Optional[float] = None)[source]

Predict song labels in a wav file.

Saves hdf5 file with keys: events, segments, class_probabilities

Parameters
  • recording_filename (str) – path to the WAV file with the audio data.

  • model_save_name (str) – path with the trunk name of the model.

  • save_filename (Optional[str]) – path to save annotations to. [Optional] - will strip extension from recording_filename and add ‘_das.h5’.

  • verbose (int) – display progress bar during prediction. Defaults to 1.

  • batch_size (Optional[int]) – number of chunks processed at once . Defaults to None (the default used during training). Larger batches lead to faster inference. Limited by memory size, in particular for GPUs which typically have 8GB.

  • event_thres (float) – Confidence threshold for detecting peaks. Range 0..1. Defaults to 0.5.

  • event_dist (float) – Minimal distance between adjacent events during thresholding. Prevents detecting duplicate events when the confidence trace is a little noisy. Defaults to 0.01.

  • event_dist_min (float) – MINimal inter-event interval for the event filter run during post processing. Defaults to 0.

  • event_dist_max (Optional[float]) – MAXimal inter-event interval for the event filter run during post processing. Defaults to None (no upper limit).

  • segment_thres (float) – Confidence threshold for detecting segments. Range 0..1. Defaults to 0.5.

  • segment_minlen (Optional[float]) – Minimal duration of a segment used for filtering out spurious detections. Defaults to None.

  • segment_fillgap (Optional[float]) – Gap between adjacent segments to be filled. Useful for correcting brief lapses. Defaults to None.

das.predict.labels_from_probabilities(probabilities, threshold=None)[source]

Convert class-wise probabilities into labels.

Parameters
  • probabilities ([type]) – [samples, classes] or [samples, ]

  • threshold (float, Optional) – Argmax over all classes (Default, 2D - corresponds to 1/nb_classes or 0.5 if 1D). If float, each class probability is compared to the threshold. First class to cross threshold wins. If no class crosses threshold label will default to the first class.

Returns

labels [samples,] - index of “winning” dimension for each sample

das.predict.predict(x: numpy.array, model_save_name: Optional[str] = None, verbose: int = 1, batch_size: Optional[int] = None, model: Optional[tensorflow.python.keras.engine.training.Model] = None, params: Optional[dict] = None, event_thres: float = 0.5, event_dist: float = 0.01, event_dist_min: float = 0, event_dist_max: Optional[float] = None, segment_thres: float = 0.5, segment_minlen: Optional[float] = None, segment_fillgap: Optional[float] = None, pad: bool = True, prepend_data_padding: bool = True)[source]

[summary]

Usage: Calling predict with the path to the model will load the model and the associated params and run inference: das.predict.predict(x=data, model_save_name='tata')

To re-use the same model with multiple recordings, load the modal and params once and pass them to predict ```my_model, my_params = das.utils.load_model_and_params(model_save_name) for data in data_list:

das.predict.predict(x=data, model=my_model, params=my_params)

```

Parameters
  • x (np.array) – Audio data [samples, channels]

  • model_save_name (str) – path with the trunk name of the model. Defaults to None.

  • model (keras.model.Models) – Defaults to None.

  • params (dict) – Defaults to None.

  • verbose (int) – display progress bar during prediction. Defaults to 1.

  • batch_size (int) – number of chunks processed at once . Defaults to None (the default used during training). Larger batches lead to faster inference. Limited by memory size, in particular for GPUs which typically have 8GB. Large batch sizes lead to loss of samples since only complete batches are used.

  • pad (bool) – Append zeros to fill up batch. Otherwise the end can be cut. Defaults to False

  • event_thres (float) – Confidence threshold for detecting peaks. Range 0..1. Defaults to 0.5.

  • event_dist (float) – Minimal distance between adjacent events during thresholding. Prevents detecting duplicate events when the confidence trace is a little noisy. Defaults to 0.01.

  • event_dist_min (float) – MINimal inter-event interval for the event filter run during post processing. Defaults to 0.

  • event_dist_max (float) – MAXimal inter-event interval for the event filter run during post processing. Defaults to None (no upper limit).

  • segment_thres (float) – Confidence threshold for detecting segments. Range 0..1. Defaults to 0.5.

  • segment_minlen (float) – Minimal duration in seconds of a segment used for filtering out spurious detections. Defaults to None.

  • segment_fillgap (float) – Gap in seconds between adjacent segments to be filled. Useful for correcting brief lapses. Defaults to None.

  • pad – prepend values (repeat last sample value) to fill the last batch. Otherwise, the end of the data will not be annotated because the last, non-full batch will be skipped.

  • prepend_data_padding (bool, optional) – Restores samples that are ignored in the beginning of the first and the end of the last chunk because of “ignore_boundaries”. Defaults to True.

Raises

ValueError – [description]

Returns

[description] segments: [description] class_probabilities (np.array): [T, nb_classes] class_names (List[str]): [nb_classes]

Return type

events

das.predict.predict_events(class_probabilities, samplerate: float = 1.0, event_dims: Optional[int] = None, event_names: Optional[List[str]] = None, event_thres: float = 0.5, events_offset: float = 0, event_dist: float = 100, event_dist_min: float = 0, event_dist_max: float = inf)[source]

[summary]

Parameters
  • class_probabilities ([type]) – [samples, classes][description]

  • samplerate (float, optional) – Hz

  • event_dims ([type], optional) – [description]. Defaults to range(nb_classes).

  • event_names ([type], optional) – [description]. Defaults to event_dims.

  • event_thres (float, optional) – [description]. Defaults to 0.5.

  • events_offset (float, optional) – . Defaults to 0 seconds.

  • event_dist (float, optional) – minimal distance between events for detection (in seconds). Defaults to 100 seconds.

  • event_dist_min (float, optional) – minimal distance to nearest event for post detection interval filter (in seconds). Defaults to 0 seconds.

  • event_dist_max (float, optional) – maximal distance to nearest event for post detection interval filter (in seconds). Defaults to None (no upper limit).

Raises

ValueError – [description]

Returns

dict[index/names/sequence/seconds/probabilities]

das.predict.predict_probabililties(x, model, params, verbose=None, prepend_data_padding: bool = True)[source]

[summary]

Parameters
  • x ([samples, ...]) – [description]

  • model (tf.keras.Model) – [description]

  • params ([type]) – [description]

  • verbose (int, optional) – Verbose level for predict_generator (see tf.keras docs). Defaults to None.

  • prepend_data_padding (bool, optional) – Restores samples that are ignored in the beginning of the first and the end of the last chunk because of “ignore_boundaries”. Defaults to True.

Returns

y_pred - output of network for each sample [samples, nb_classes]

das.predict.predict_segments(class_probabilities: numpy.array, samplerate: float = 1.0, segment_dims: Optional[List[int]] = None, segment_names: Optional[List[str]] = None, segment_ref_onsets: Optional[List[float]] = None, segment_ref_offsets: Optional[List[float]] = None, segment_thres: float = 0.5, segment_minlen: Optional[float] = None, segment_fillgap: Optional[float] = None, segment_labels_by_majority: bool = True) Dict[source]

[summary]

TODO: document different approaches for single-type vs. multi-type segment detection

Parameters
  • class_probabilities ([type]) – [T, nb_classes] with probabilities for each class and sample or [T,] with integer entries as class labels

  • samplerate (float, optional) – Hz. Defaults to 1.0.

  • segment_dims (Optional[List[int]], optional) – set of indices into class_probabilities corresponding to segment-like song types. Needs to include the noise dim. Defaults to None.

  • segment_names (Optional[List[str]], optional) – [description]. Defaults to None.

  • segment_ref_onsets (Optional[List[float]], optional) – Use onsets (in seconds) as ref for estimating labels. Defaults to None (will use onsets est from class_probabilitieslabels as ref).

  • segment_ref_offsets (Optional[List[float]], optional) – [description]. Use offsets (in seconds) as ref for estimating labels. Defaults to None (will use offsets est from class_probabilitieslabels as ref).

  • segment_thres (float, optional) – [description]. Defaults to 0.5.

  • segment_minlen (Optional[float], optional) – seconds. Defaults to None.

  • segment_fillgap (Optional[float], optional) – seconds. Defaults to None.

  • segment_labels_by_majority (bool, optional) – Segment labels given by majority of label values within on- and offsets. Defaults to True.

Returns

dict[‘segmentnames’][‘denselabels-samples’/’onsets’/’offsets’/’probabilities’]