das.data

Utils for loading and manipulating data for training and prediction.

class das.data.AudioSequence(x, y=None, batch_size=32, shuffle=True, nb_hist=1, y_offset=None, stride=1, cut_trailing_dim=False, with_y_hist=False, data_padding=0, first_sample=0, last_sample=None, output_stride=1, nb_repeats=1, shuffle_subset=None, unpack_channels=False, mask_input=None, **kwargs)[source]

[summary]

...

[summary]

x and y can be mem-mapped numpy arrays or lazily loaded hdf5 (zarr, xarray) datasets. Dask arrays do not work since they are immutable. :param x: [nb_samples, …] :type x: np.ndarray :param y: [nb_samples, nb_classes] - class probabilities - so sum over classes for each sample should be 1.0. Defaults to None.

If None, getitem will only return x batches - neither y nor sample weights

Parameters
  • batch_size (int, optional) – number of batches to return. Defaults to 32.

  • shuffle (bool, optional) – randomize order of batches. Defaults to True.

  • nb_hist (int, optional) – nb of time steps per batch. Defaults to 1.

  • y_offset ([type], optional) – time offset between x and y. nb_hist/2 if None (predict central sample in each batch). Defaults to None.

  • stride (int, optional) – nb of time steps between batches. Defaults to 1.

  • cut_trailing_dim (bool, optional) – Remove trailing dimension. Defaults to False.

  • with_y_hist (bool, optional) – y as central value of the x_hist window (False) or the full sequence covering the x_hist window (True). Defaults to False.

  • data_padding (int, optional) – if > 0, will set weight of as many samples at start and end of nb_hist window to zero. Defaults to 0.

  • first_sample (int) – 0

  • last_sample (int) – None - last_sample in x, otherwise last_sample

  • output_stride (int) – Take every Nth sample as output. Useful in combination with a “downsampling frontend”. Defaults to 1 (every sample).

  • nb_repeats (int) – Number of repeats before the dataset runs out of data. Defaults to 1 (no repeats).

  • shuffle_subset (float) – Fraction of batches to use - only works if shuffle=True

  • unpack_channels (bool) – For multi-channel models with single-channel preprocessing - unpack [nb_hist, nb_channels] -> [nb_channels * [nb_hist, 1]]

  • mask_input (int) – halfwidth of the number of central samples to mask. Defaults to None (no masking).

unroll(return_x=True, merge_batches=True)[source]

[summary]

Parameters
  • return_x=True

  • merge_batches=True

Returns

[description] (xx, yy), (None, yy) or (xx,)

Return type

[type]

das.data.sub_range(data_len, fraction: float, min_nb_samples: int = 0, seed=None)[source]

[summary]

Parameters
  • data_len (int) – total length of data

  • fraction (float) – fraction of data_len to use

  • seed (float) – seed random number generator for reproducible subset selection

Returns

first_sample (int), last_sample (int)

das.data.unpack_batches(x, padding=0)[source]

[summary]

Parameters
  • x ([type]) – [description]

  • padding (int, optional) – [description]. Defaults to 0.

Returns

[description]

Return type

[type]