Inspect dataset

%config InlineBackend.figure_format = 'jpg'  # smaller mem footprint for page

import dss.npy_dir
import numpy as np
import matplotlib.pyplot as plt'ncb.mplstyle')
ds = dss.npy_dir.load('tutorial_dataset.npy')

Plot audio and annotations

Plot, for the train, validation, and test set:

  • the audio

  • the training targets as line plots

  • the training targets as a pseudo-color plot (black - high probably, white - low probability).

Adjust the start and duration, start_seconds and duration_seconds, to plot a part of the recording that should have annotated song.

Things to check:

  • alignment between audio x and training targets y

  • width of events is appropriate

  • completeness of annotations

In the example below the test set is incompletely annotated around 3.5 seconds.

start_seconds = 2
duration_seconds = 4
for typ in ['train','val','test']:
    t0_samples = int(start_seconds * ds.attrs['samplerate_x_Hz'])
    t1_samples = int(t0_samples + duration_seconds * ds.attrs['samplerate_x_Hz'])

    tx = np.arange(t0_samples, t1_samples) / ds.attrs['samplerate_x_Hz']
    plt.figure(figsize=(30, 6))
    ax = plt.subplot(311)
    plt.plot(tx, ds[typ]['x'][t0_samples:t1_samples], 'k')

    t0_samples = int(start_seconds * ds.attrs['samplerate_y_Hz'])
    t1_samples = int(t0_samples + duration_seconds * ds.attrs['samplerate_y_Hz'])
    ty = np.arange(t0_samples, t1_samples) / ds.attrs['samplerate_y_Hz']
    plt.subplot(312, sharex=ax)
    plt.plot(ty, ds[typ]['y'][t0_samples:t1_samples, :])
    plt.ylim(0, 1.1)
    plt.imshow(ds[typ]['y'][t0_samples:t1_samples, :].T.astype(np.float), cmap='Greys')
    plt.yticks(range(len(ds.attrs['class_names'])), labels=ds.attrs['class_names'])
../_images/inspect_dataset_6_0.jpg ../_images/inspect_dataset_6_1.jpg ../_images/inspect_dataset_6_2.jpg