Utils for loading and manipulating data for training and prediction.
- class das.data.AudioSequence(x, y=None, batch_size=32, shuffle=True, nb_hist=1, y_offset=None, stride=1, cut_trailing_dim=False, with_y_hist=False, data_padding=0, first_sample=0, last_sample=None, output_stride=1, nb_repeats=1, shuffle_subset=None, unpack_channels=False, mask_input=None, **kwargs)¶
x and y can be mem-mapped numpy arrays or lazily loaded hdf5 (zarr, xarray) datasets. Dask arrays do not work since they are immutable. :param x: [nb_samples, …] :type x: np.ndarray :param y: [nb_samples, nb_classes] - class probabilities - so sum over classes for each sample should be 1.0. Defaults to None.
If None, getitem will only return x batches - neither y nor sample weights
batch_size (int, optional) – number of batches to return. Defaults to 32.
shuffle (bool, optional) – randomize order of batches. Defaults to True.
nb_hist (int, optional) – nb of time steps per batch. Defaults to 1.
y_offset ([type], optional) – time offset between x and y. nb_hist/2 if None (predict central sample in each batch). Defaults to None.
stride (int, optional) – nb of time steps between batches. Defaults to 1.
cut_trailing_dim (bool, optional) – Remove trailing dimension. Defaults to False.
with_y_hist (bool, optional) – y as central value of the x_hist window (False) or the full sequence covering the x_hist window (True). Defaults to False.
data_padding (int, optional) – if > 0, will set weight of as many samples at start and end of nb_hist window to zero. Defaults to 0.
first_sample (int) – 0
last_sample (int) – None - last_sample in x, otherwise last_sample
output_stride (int) – Take every Nth sample as output. Useful in combination with a “downsampling frontend”. Defaults to 1 (every sample).
nb_repeats (int) – Number of repeats before the dataset runs out of data. Defaults to 1 (no repeats).
shuffle_subset (float) – Fraction of batches to use - only works if shuffle=True
unpack_channels (bool) – For multi-channel models with single-channel preprocessing - unpack [nb_hist, nb_channels] -> [nb_channels * [nb_hist, 1]]
mask_input (int) – halfwidth of the number of central samples to mask. Defaults to None (no masking).
- unroll(return_x=True, merge_batches=True)¶
[description] (xx, yy), (None, yy) or (xx,)
- Return type
- das.data.sub_range(data_len, fraction: float, min_nb_samples: int = 0, seed=None)¶
data_len (int) – total length of data
fraction (float) – fraction of data_len to use
seed (float) – seed random number generator for reproducible subset selection
first_sample (int), last_sample (int)
- das.data.unpack_batches(x, padding=0)¶
x ([type]) – [description]
padding (int, optional) – [description]. Defaults to 0.
- Return type