das.predict#

Code for training and evaluating networks.

das.predict.cli_predict(path: str, model_save_name: str, *, save_filename: Optional[str] = None, save_format: str = 'csv', verbose: int = 1, batch_size: Optional[int] = None, event_thres: float = 0.5, event_dist: float = 0.01, event_dist_min: float = 0, event_dist_max: float = inf, segment_thres: float = 0.5, segment_use_optimized: bool = True, segment_minlen: Optional[float] = None, segment_fillgap: Optional[float] = None, bandpass_low_freq: float = None, bandpass_up_freq: float = None, resample: bool = True)[source]#

Predict song labels for a wav file or a folder of wav files.

Saves hdf5 files with keys: events, segments, class_probabilities OR csv files with columns: label/start_seconds/stop_seconds

Parameters
  • path (str) – Path to a single WAV file with the audio data or to a folder with WAV files.

  • model_save_name (str) – Stem of the path for the model (and parameters). File to load will be MODEL_SAVE_NAME + _model.h5.

  • save_filename (Optional[str]) – Path to save annotations to. If omitted, will construct save_filename by stripping the extension from recording_filename and adding ‘_das.h5’ or ‘_annotations.csv’. Will be ignored if path is a folder.

  • save_format (str) – ‘csv’ or ‘h5’. csv: tabular text file with label, start and end seconds for each predicted song. h5: same information as in csv plus confidence values for each sample and song type. Defaults to ‘csv’.

  • verbose (int) – Display progress bar during prediction. Defaults to 1.

  • batch_size (Optional[int]) – Number of chunks processed at once. Defaults to None (the default used during training).

  • event_thres (float) – Confidence threshold for detecting events. Range 0..1. Defaults to 0.5.

  • event_dist (float) – Minimal distance between adjacent events during thresholding. Prevents detecting duplicate events when the confidence trace is a little noisy. Defaults to 0.01.

  • event_dist_min (float) – MINimal inter-event interval for the event filter run during post processing. Defaults to 0.

  • event_dist_max (float) – MAXimal inter-event interval for the event filter run during post processing. Defaults to np.inf.

  • segment_thres (float) – Confidence threshold for detecting segments. Range 0..1. Defaults to 0.5.

  • segment_use_optimized (bool) – Use minlen and fillgap values from param file if they exist. If segment_minlen and segment_fillgap are provided, then they will override the values from the param file. Defaults to True.

  • segment_minlen (Optional[float]) – Minimal duration of a segment used for filtering out spurious detections. Defaults to None (keep all segments).

  • segment_fillgap (Optional[float]) – Gap between adjacent segments to be filled. Useful for correcting brief lapses. Defaults to None (do not fill gaps).

  • bandpass_low_freq (float) – Lower cutoff frequency in Hz for bandpass filtering audio data. Defaults to 1.0.

  • bandpass_up_freq (float) – Upper cutoff frequency in Hz for bandpass filtering audio data. Defaults to samplingrate / 2.

  • resample (bool) – Resample audio data to the rate expected by the model. Defaults to True.

Raises

ValueError on unknown save_format

das.predict.labels_from_probabilities(probabilities, threshold: Optional[float] = None, indices: Optional[Union[Sequence[int], slice]] = None) numpy.ndarray[source]#

Convert class-wise probabilities into labels.

Parameters
  • probabilities ([type]) – [samples, classes] or [samples, ]

  • threshold (float, Optional) – Argmax over all classes (Default, 2D - corresponds to 1/nb_classes or 0.5 if 1D). If float, each class probability is compared to the threshold. First class to cross threshold wins. If no class crosses threshold label will default to the first class.

  • indices – (List[int], Optional): List of indices into axis 1 for which to compute the labels. Defaults to None (use all indices).

Returns

labels [samples,] - index of “winning” dimension for each sample

das.predict.predict(x: numpy.ndarray, model_save_name: str = None, verbose: int = 1, batch_size: int = None, model: keras.src.engine.training.Model = None, params: Dict = None, event_thres: float = 0.5, event_dist: float = 0.01, event_dist_min: float = 0, event_dist_max: float = inf, segment_thres: float = 0.5, segment_use_optimized: bool = True, segment_minlen: float = None, segment_fillgap: float = None, pad: bool = True, prepend_data_padding: bool = True, save_memory: bool = False, bandpass_low_freq: float = None, bandpass_up_freq: float = None, resample: bool = True, fs_audio: Optional[float] = None)[source]#

[summary]

Usage: Calling predict with the path to the model will load the model and the associated params and run inference: das.predict.predict(x=data, model_save_name='tata')

To re-use the same model with multiple recordings, load the modal and params once and pass them to predict ```my_model, my_params = das.utils.load_model_and_params(model_save_name) for data in data_list:

das.predict.predict(x=data, model=my_model, params=my_params)

```

Parameters
  • x (np.array) – Audio data [samples, channels]

  • model_save_name (str) – path with the trunk name of the model. Defaults to None.

  • model (keras.model.Models) – Defaults to None.

  • params (dict) – Defaults to None.

  • verbose (int) – display progress bar during prediction. Defaults to 1.

  • batch_size (int) – number of chunks processed at once . Defaults to None (the default used during training). Larger batches lead to faster inference. Limited by memory size, in particular for GPUs which typically have 8GB. Large batch sizes lead to loss of samples since only complete batches are used.

  • pad (bool) – Append zeros to fill up batch. Otherwise the end can be cut. Defaults to False

  • event_thres (float) – Confidence threshold for detecting peaks. Range 0..1. Defaults to 0.5.

  • event_dist (float) – Minimal distance between adjacent events during thresholding. Prevents detecting duplicate events when the confidence trace is a little noisy. Defaults to 0.01.

  • event_dist_min (float) – MINimal inter-event interval for the event filter run during post processing. Defaults to 0.

  • event_dist_max (float) – MAXimal inter-event interval for the event filter run during post processing. Defaults to np.inf.

  • segment_thres (float) – Confidence threshold for detecting segments. Range 0..1. Defaults to 0.5.

  • segment_use_optimized (bool) – Use minlen and fillgap values from param file if they exist. If segment_minlen and segment_fillgap are provided, then they will override the values from the param file. Defaults to True.

  • segment_minlen (float) – Minimal duration in seconds of a segment used for filtering out spurious detections. Defaults to None.

  • segment_fillgap (float) – Gap in seconds between adjacent segments to be filled. Useful for correcting brief lapses. Defaults to None.

  • pad – prepend values (repeat last sample value) to fill the last batch. Otherwise, the end of the data will not be annotated because the last, non-full batch will be skipped.

  • prepend_data_padding (bool, optional) – Restores samples that are ignored in the beginning of the first and the end of the last chunk because of “ignore_boundaries”. Defaults to True.

  • save_memory (bool) – If true, will return memmaped dask.arrays that reside on disk for chunked computations. Convert to np.arrays via the array’s compute() function. Defaults to False.

Raises

ValueError – [description]

Returns

[description] segments: [description] class_probabilities (np.array): [T, nb_classes] class_names (List[str]): [nb_classes]

Return type

events

das.predict.predict_events(class_probabilities: numpy.ndarray, samplerate: float = 1.0, event_dims: Optional[Iterable[int]] = None, event_names: Optional[Iterable[str]] = None, event_thres: float = 0.5, events_offset: float = 0, event_dist: float = 100, event_dist_min: float = 0, event_dist_max: float = inf) Dict[str, Any][source]#

[summary]

Parameters
  • class_probabilities (np.ndarray) – [samples, classes][description]

  • samplerate (float, optional) – Hz

  • event_dims (List[int], optional) – [description]. Defaults to np.arange(1, nb_classes).

  • event_names ([type], optional) – [description]. Defaults to event_dims.

  • event_thres (float, optional) – [description]. Defaults to 0.5.

  • events_offset (float, optional) – . Defaults to 0 seconds.

  • event_dist (float, optional) – minimal distance between events for detection (in seconds). Defaults to 100 seconds.

  • event_dist_min (float, optional) – minimal distance to nearest event for post detection interval filter (in seconds). Defaults to 0 seconds.

  • event_dist_max (float, optional) – maximal distance to nearest event for post detection interval filter (in seconds). Defaults to None (no upper limit).

Raises

ValueError – [description]

Returns

Dict[str, Any]

das.predict.predict_probabilities(x: numpy.ndarray, model: keras.src.engine.training.Model, params: Dict[str, Any], verbose: Optional[int] = 1, prepend_data_padding: bool = True)[source]#

[summary]

Parameters
  • x ([samples, ...]) – [description]

  • model (tf.keras.Model) – [description]

  • params ([type]) – [description]

  • verbose (int, optional) – Verbose level for predict() (see keras docs). Defaults to 1.

  • prepend_data_padding (bool, optional) – Restores samples that are ignored in the beginning of the first and the end of the last chunk because of “ignore_boundaries”. Defaults to True.

Returns

y_pred - output of network for each sample [samples, nb_classes]

das.predict.predict_segments(class_probabilities: numpy.ndarray, samplerate: float = 1.0, segment_dims: Optional[Sequence[int]] = None, segment_names: Optional[Sequence[str]] = None, segment_ref_onsets: Optional[List[float]] = None, segment_ref_offsets: Optional[List[float]] = None, segment_thres: float = 0.5, segment_minlen: Optional[float] = None, segment_fillgap: Optional[float] = None, segment_labels_by_majority: bool = True) Dict[source]#

[summary]

TODO: document different approaches for single-type vs. multi-type segment detection

Parameters
  • class_probabilities ([type]) – [T, nb_classes] with probabilities for each class and sample or [T,] with integer entries as class labels

  • samplerate (float, optional) – Hz. Defaults to 1.0.

  • segment_dims (Optional[List[int]], optional) – set of indices into class_probabilities corresponding to segment-like song types. Needs to include the noise dim. Required to ignore event-like song types. Defaults to None (all classes are considered segment-like).

  • segment_names (Optional[List[str]], optional) – Names for segment-like classes. Defaults to None (use indices of segment-like classes).

  • segment_ref_onsets (Optional[List[float]], optional) – Syllable onsets (in seconds) to use for estimating labels. Defaults to None (will use onsets est from class_probabilitieslabels as ref).

  • segment_ref_offsets (Optional[List[float]], optional) – [description]. Syllable offsets (in seconds) to use for estimating labels. Defaults to None (will use offsets est from class_probabilitieslabels as ref).

  • segment_thres (float, optional) – [description]. Defaults to 0.5.

  • segment_minlen (Optional[float], optional) – seconds. Defaults to None.

  • segment_fillgap (Optional[float], optional) – seconds. Defaults to None.

  • segment_labels_by_majority (bool, optional) – Segment labels given by majority of label values within on- and offsets. Defaults to True.

Returns

dict[‘segmentnames’][‘denselabels-samples’/’onsets’/’offsets’/’probabilities’]