Kapre backend functions#

Some backend functions that mainly use numpy.
Functions with Keras’ backend is in backend_keras.py.


  • Don’t forget to use K.float()! Otherwise numpy uses float64.

  • Some functions are copied-and-pasted from librosa (to reduce dependency), but

    later I realised it’d be better to just use it.

  • TODO: remove copied code and use librosa.

das.kapre.backend.filterbank_log(sr, n_freq, n_bins=84, bins_per_octave=12, fmin=None, spread=0.125)[source]#

[np] Approximate a constant-Q filter bank for a fixed-window STFT.

Each filter is a log-normal window centered at the corresponding frequency.

Note: logfrequency in librosa 0.4 (deprecated), so copy-and-pasted,

tuning was removed, n_freq instead of n_fft.

  • sr (number > 0 [scalar]) – audio sampling rate

  • n_freq (int > 0 [scalar]) – number of frequency bins

  • n_bins (int > 0 [scalar]) – Number of bins. Defaults to 84 (7 octaves).

  • bins_per_octave (int > 0 [scalar]) – Number of bins per octave. Defaults to 12 (semitones).

  • fmin (float > 0 [scalar]) – Minimum frequency bin. Defaults to C1 ~= 32.70

  • spread (float > 0 [scalar]) – Spread of each filter, as a fraction of a bin.


C – log-frequency filter bank.

Return type

np.ndarray [shape=(n_bins, 1 + n_fft/2)]

das.kapre.backend.filterbank_mel(sr, n_freq, n_mels=128, fmin=0.0, fmax=None, htk=False, norm=1)[source]#


[np] Return dft kernels for real/imagnary parts assuming

the input . is real.

An asymmetric hann window is used (scipy.signal.hann).


n_dft (int > 0 and power of 2 [scalar]) – Number of dft components.


  • | dft_real_kernels (np.ndarray [shape=(nb_filter, 1, 1, n_win)])

  • | dft_imag_kernels (np.ndarray [shape=(nb_filter, 1, 1, n_win)])

  • nb_filter = n_dft/2 + 1

  • n_win = n_dft

das.kapre.backend.mel(sr, n_dft, n_mels=128, fmin=0.0, fmax=None, htk=False, norm=1)[source]#

[np] create a filterbank matrix to combine stft bins into mel-frequency bins use Slaney (said Librosa)

n_mels: numbre of mel bands fmin : lowest frequency [Hz] fmax : highest frequency [Hz]

If None, use sr / 2.0