das.kapre.time_frequency#
- class das.kapre.time_frequency.Melspectrogram(*args, **kwargs)[source]#
###
Melspectrogram``python kapre.time_frequency.Melspectrogram(sr=22050, n_mels=128, fmin=0.0, fmax=None,power_melgram=1.0, return_decibel_melgram=False, trainable_fb=False, **kwargs)
- d
Mel-spectrogram layer that outputs mel-spectrogram(s) in 2D image format.
Its base class is
Spectrogram.Mel-spectrogram is an efficient representation using the property of human auditory system – by compressing frequency axis into mel-scale axis.
- #### Parameters
sr: integer > 0 [scalar] - sampling rate of the input audio signal. - Default:
22050n_mels: int > 0 [scalar] - The number of mel bands. - Default:
128fmin: float > 0 [scalar] - Minimum frequency to include in Mel-spectrogram. - Default:
0.0fmax: float >
fmin[scalar] - Maximum frequency to include in Mel-spectrogram. - IfNone, it is inferred assr / 2. - Default:Nonepower_melgram: float [scalar] - Power of
2.0if power-spectrogram, -1.0if amplitude spectrogram. - Default:1.0return_decibel_melgram: bool - Whether to return in decibel or not, i.e. returns log10(amplitude spectrogram) if
True. - Recommended to useTrue, although it’s not by default. - Default:Falsetrainable_fb: bool - Whether the spectrogram -> mel-spectrogram filterbanks are trainable. - If
True, the frequency-to-mel matrix is initialised with mel frequencies but trainable. - IfFalse, it is initialised and then frozen. - Default:Falsehtk: bool - Check out Librosa’s
mel-spectrogramormeloption.norm: float [scalar] - Check out Librosa’s
mel-spectrogramormeloption.**kwargs: - The keyword arguments of
Spectrogramsuch asn_dft,n_hop, -padding,trainable_kernel,image_data_format.
- #### Notes
The input should be a 2D array,
(audio_channel, audio_length).
- E.g.,
(1, 44100)for mono signal,(2, 44100)for stereo signal. It supports multichannel signal input, so
audio_channelcan be any positive integer.The input shape is not related to keras
image_data_format()config.
#### Returns
- A Keras layer
abs(mel-spectrogram) in a shape of 2D data, i.e.,
(None, n_channel, n_mels, n_time)ifchannels_first,(None, n_mels, n_time, n_channel)ifchannels_last,
[summary]
- Parameters
n_dft (int, optional) – The number of DFT points. Best if power of 2. Defaults to 512.
n_hop (Optional[int], optional) – Hop length between frames in sample. Best if <=
n_dft. Defaults to None.padding (str, optional) – Pads signal boundaries (
sameorvalid). Defaults to ‘same’.power_spectrogram (float, optional) –
2.0for power,1.0for amplitude spectrogram. Defaults to 2.0 (power).return_decibel_spectrogram (bool, optional) – Convert spectrogram values to dB. Recommended. Defaults to False.
trainable_kernel (bool, optional) – If True, kernels will be optimized during training. Defaults to False.
image_data_format (str, optional) –
channels_firstorchannels_lastor keras’default. Defaults to ‘default’.
Notes
The input should be a 2D array,
(audio_channel, audio_length). E.g.,(1, 44100)for mono signals,(2, 44100)for stereo signals.Supports multichannel inputs, so
audio_channelcan be any positive integer.The input shape is not related to keras
image_data_format()config.
- Returns
- Keras layer computing the spectrogram
if
channels_first:(None, n_channel, n_time, n_freq, )if
channels_last:(None, n_time, n_freq, n_channel)
- Return type
Layer
- build(input_shape)[source]#
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of
LayerorModelcan override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution ofcall().This is typically used to create the weights of
Layersubclasses (at the discretion of the subclass implementer).- Parameters
input_shape – Instance of
TensorShape, or list of instances ofTensorShapeif the layer expects a list of inputs (one instance per input).
- call(x)[source]#
This is where the layer’s logic lives.
The
call()method may not create state (except in its first invocation, wrapping the creation of variables or other resources intf.init_scope()). It is recommended to create state, includingtf.Variableinstances and nestedLayerinstances,in
__init__(), or in thebuild()method that iscalled automatically before
call()executes for the first time.- Parameters
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional
inputsargument is subject to special rules: -inputsmust be explicitly passed. A layer cannot have zeroarguments, and
inputscannot be provided via the default value of a keyword argument.NumPy array or Python scalar values in
inputsget cast as tensors.Keras mask metadata is only collected from
inputs.Layers are built (
build(input_shape)method) using shape info frominputsonly.input_speccompatibility is only checked againstinputs.Mixed precision input casting is only applied to
inputs. If a layer has tensor arguments in*argsor**kwargs, their casting behavior in mixed precision should be handled manually.The SavedModel input specification is generated using
inputsonly.Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for
inputsand not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: -
training: Boolean scalar tensor of Python boolean indicatingwhether the
callis meant for training or inference.mask: Boolean input mask. If the layer’scall()method takes amaskargument, its default value will be set to the mask generated forinputsby the previous layer (ifinputdid come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns
A tensor or list/tuple of tensors.
- compute_output_shape(input_shape)[source]#
Computes the output shape of the layer.
This method will cause the layer’s state to be built, if that has not happened before. This requires that the layer will later be used with inputs that match the input shape provided here.
- Parameters
input_shape – Shape tuple (tuple of integers) or
tf.TensorShape, or structure of shape tuples /tf.TensorShapeinstances (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.- Returns
A
tf.TensorShapeinstance or structure oftf.TensorShapeinstances.
- get_config()[source]#
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by
Network(one layer of abstraction above).Note that
get_config()does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.- Returns
Python dictionary.
- class das.kapre.time_frequency.Spectrogram(*args, **kwargs)[source]#
Spectrogram layer returns spectrogram(s).
Examples
>>> kapre.time_frequency.Spectrogram( n_dft=512, n_hop=None, padding='same', power_spectrogram=2.0, return_decibel_spectrogram=False, trainable_kernel=False, image_data_format='default' )
[summary]
- Parameters
n_dft (int, optional) – The number of DFT points. Best if power of 2. Defaults to 512.
n_hop (Optional[int], optional) – Hop length between frames in sample. Best if <=
n_dft. Defaults to None.padding (str, optional) – Pads signal boundaries (
sameorvalid). Defaults to ‘same’.power_spectrogram (float, optional) –
2.0for power,1.0for amplitude spectrogram. Defaults to 2.0 (power).return_decibel_spectrogram (bool, optional) – Convert spectrogram values to dB. Recommended. Defaults to False.
trainable_kernel (bool, optional) – If True, kernels will be optimized during training. Defaults to False.
image_data_format (str, optional) –
channels_firstorchannels_lastor keras’default. Defaults to ‘default’.
Notes
The input should be a 2D array,
(audio_channel, audio_length). E.g.,(1, 44100)for mono signals,(2, 44100)for stereo signals.Supports multichannel inputs, so
audio_channelcan be any positive integer.The input shape is not related to keras
image_data_format()config.
- Returns
- Keras layer computing the spectrogram
if
channels_first:(None, n_channel, n_time, n_freq, )if
channels_last:(None, n_time, n_freq, n_channel)
- Return type
Layer
- build(input_shape)[source]#
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of
LayerorModelcan override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution ofcall().This is typically used to create the weights of
Layersubclasses (at the discretion of the subclass implementer).- Parameters
input_shape – Instance of
TensorShape, or list of instances ofTensorShapeif the layer expects a list of inputs (one instance per input).
- call(x)[source]#
This is where the layer’s logic lives.
The
call()method may not create state (except in its first invocation, wrapping the creation of variables or other resources intf.init_scope()). It is recommended to create state, includingtf.Variableinstances and nestedLayerinstances,in
__init__(), or in thebuild()method that iscalled automatically before
call()executes for the first time.- Parameters
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional
inputsargument is subject to special rules: -inputsmust be explicitly passed. A layer cannot have zeroarguments, and
inputscannot be provided via the default value of a keyword argument.NumPy array or Python scalar values in
inputsget cast as tensors.Keras mask metadata is only collected from
inputs.Layers are built (
build(input_shape)method) using shape info frominputsonly.input_speccompatibility is only checked againstinputs.Mixed precision input casting is only applied to
inputs. If a layer has tensor arguments in*argsor**kwargs, their casting behavior in mixed precision should be handled manually.The SavedModel input specification is generated using
inputsonly.Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for
inputsand not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: -
training: Boolean scalar tensor of Python boolean indicatingwhether the
callis meant for training or inference.mask: Boolean input mask. If the layer’scall()method takes amaskargument, its default value will be set to the mask generated forinputsby the previous layer (ifinputdid come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns
A tensor or list/tuple of tensors.
- compute_output_shape(input_shape)[source]#
Computes the output shape of the layer.
This method will cause the layer’s state to be built, if that has not happened before. This requires that the layer will later be used with inputs that match the input shape provided here.
- Parameters
input_shape – Shape tuple (tuple of integers) or
tf.TensorShape, or structure of shape tuples /tf.TensorShapeinstances (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.- Returns
A
tf.TensorShapeinstance or structure oftf.TensorShapeinstances.
- get_config()[source]#
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by
Network(one layer of abstraction above).Note that
get_config()does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.- Returns
Python dictionary.
- das.kapre.time_frequency.conv_output_length(input_length, filter_size, padding, stride, dilation=1)[source]#
Determines output length of a convolution given input length. # Arguments
input_length: integer. filter_size: integer. padding: one of
"same","valid","full". stride: integer. dilation: dilation rate, integer.- # Returns
The output length (integer).