das.kapre.time_frequency

class das.kapre.time_frequency.Melspectrogram(*args, **kwargs)[source]

### Melspectrogram ``python kapre.time_frequency.Melspectrogram(sr=22050, n_mels=128, fmin=0.0, fmax=None,

power_melgram=1.0, return_decibel_melgram=False, trainable_fb=False, **kwargs)

``

d

Mel-spectrogram layer that outputs mel-spectrogram(s) in 2D image format.

Its base class is Spectrogram.

Mel-spectrogram is an efficient representation using the property of human auditory system – by compressing frequency axis into mel-scale axis.

#### Parameters
  • sr: integer > 0 [scalar] - sampling rate of the input audio signal. - Default: 22050

  • n_mels: int > 0 [scalar] - The number of mel bands. - Default: 128

  • fmin: float > 0 [scalar] - Minimum frequency to include in Mel-spectrogram. - Default: 0.0

  • fmax: float > fmin [scalar] - Maximum frequency to include in Mel-spectrogram. - If None, it is inferred as sr / 2. - Default: None

  • power_melgram: float [scalar] - Power of 2.0 if power-spectrogram, - 1.0 if amplitude spectrogram. - Default: 1.0

  • return_decibel_melgram: bool - Whether to return in decibel or not, i.e. returns log10(amplitude spectrogram) if True. - Recommended to use True, although it’s not by default. - Default: False

  • trainable_fb: bool - Whether the spectrogram -> mel-spectrogram filterbanks are trainable. - If True, the frequency-to-mel matrix is initialised with mel frequencies but trainable. - If False, it is initialised and then frozen. - Default: False

  • htk: bool - Check out Librosa’s mel-spectrogram or mel option.

  • norm: float [scalar] - Check out Librosa’s mel-spectrogram or mel option.

  • **kwargs: - The keyword arguments of Spectrogram such as n_dft, n_hop, - padding, trainable_kernel, image_data_format.

#### Notes
  • The input should be a 2D array, (audio_channel, audio_length).

E.g., (1, 44100) for mono signal, (2, 44100) for stereo signal.
  • It supports multichannel signal input, so audio_channel can be any positive integer.

  • The input shape is not related to keras image_data_format() config.

#### Returns

A Keras layer
  • abs(mel-spectrogram) in a shape of 2D data, i.e.,

  • (None, n_channel, n_mels, n_time) if channels_first,

  • (None, n_mels, n_time, n_channel) if channels_last,

[summary]

Parameters
  • n_dft (int, optional) – The number of DFT points. Best if power of 2. Defaults to 512.

  • n_hop (Optional[int], optional) – Hop length between frames in sample. Best if <= n_dft. Defaults to None.

  • padding (str, optional) – Pads signal boundaries (same or valid). Defaults to ‘same’.

  • power_spectrogram (float, optional) – 2.0 for power, 1.0 for amplitude spectrogram. Defaults to 2.0 (power).

  • return_decibel_spectrogram (bool, optional) – Convert spectrogram values to dB. Recommended. Defaults to False.

  • trainable_kernel (bool, optional) – If True, kernels will be optimized during training. Defaults to False.

  • image_data_format (str, optional) – channels_first or channels_last or keras’ default. Defaults to ‘default’.

Notes

  • The input should be a 2D array, (audio_channel, audio_length). E.g., (1, 44100) for mono signals, (2, 44100) for stereo signals.

  • Supports multichannel inputs, so audio_channel can be any positive integer.

  • The input shape is not related to keras image_data_format() config.

Returns

Keras layer computing the spectrogram
  • if channels_first: (None, n_channel, n_time, n_freq, )

  • if channels_last: (None, n_time, n_freq, n_channel)

Return type

Layer

build(input_shape)[source]

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Parameters

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(x)[source]

This is where the layer’s logic lives.

Note here that call() method in tf.keras is little bit different from keras API. In keras API, you can pass support masking for layers as additional arguments. Whereas tf.keras has compute_mask() method to support masking.

Parameters
  • inputs – Input tensor, or list/tuple of input tensors.

  • *args – Additional positional arguments. Currently unused.

  • **kwargs – Additional keyword arguments. Currently unused.

Returns

A tensor or list/tuple of tensors.

compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

If the layer has not been built, this method will call build on the layer. This assumes that the layer will later be used with inputs that match the input shape provided here.

Parameters

input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.

Returns

An input shape tuple.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns

Python dictionary.

class das.kapre.time_frequency.Spectrogram(*args, **kwargs)[source]

Spectrogram layer returns spectrogram(s).

Examples

>>> kapre.time_frequency.Spectrogram(
            n_dft=512, n_hop=None, padding='same',
            power_spectrogram=2.0, return_decibel_spectrogram=False,
            trainable_kernel=False, image_data_format='default'
            )

[summary]

Parameters
  • n_dft (int, optional) – The number of DFT points. Best if power of 2. Defaults to 512.

  • n_hop (Optional[int], optional) – Hop length between frames in sample. Best if <= n_dft. Defaults to None.

  • padding (str, optional) – Pads signal boundaries (same or valid). Defaults to ‘same’.

  • power_spectrogram (float, optional) – 2.0 for power, 1.0 for amplitude spectrogram. Defaults to 2.0 (power).

  • return_decibel_spectrogram (bool, optional) – Convert spectrogram values to dB. Recommended. Defaults to False.

  • trainable_kernel (bool, optional) – If True, kernels will be optimized during training. Defaults to False.

  • image_data_format (str, optional) – channels_first or channels_last or keras’ default. Defaults to ‘default’.

Notes

  • The input should be a 2D array, (audio_channel, audio_length). E.g., (1, 44100) for mono signals, (2, 44100) for stereo signals.

  • Supports multichannel inputs, so audio_channel can be any positive integer.

  • The input shape is not related to keras image_data_format() config.

Returns

Keras layer computing the spectrogram
  • if channels_first: (None, n_channel, n_time, n_freq, )

  • if channels_last: (None, n_time, n_freq, n_channel)

Return type

Layer

build(input_shape)[source]

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Parameters

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(x)[source]

This is where the layer’s logic lives.

Note here that call() method in tf.keras is little bit different from keras API. In keras API, you can pass support masking for layers as additional arguments. Whereas tf.keras has compute_mask() method to support masking.

Parameters
  • inputs – Input tensor, or list/tuple of input tensors.

  • *args – Additional positional arguments. Currently unused.

  • **kwargs – Additional keyword arguments. Currently unused.

Returns

A tensor or list/tuple of tensors.

compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

If the layer has not been built, this method will call build on the layer. This assumes that the layer will later be used with inputs that match the input shape provided here.

Parameters

input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.

Returns

An input shape tuple.

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns

Python dictionary.

das.kapre.time_frequency.conv_output_length(input_length, filter_size, padding, stride, dilation=1)[source]

Determines output length of a convolution given input length. # Arguments

input_length: integer. filter_size: integer. padding: one of "same", "valid", "full". stride: integer. dilation: dilation rate, integer.

# Returns

The output length (integer).