das.kapre.time_frequency#
- class das.kapre.time_frequency.Melspectrogram(*args, **kwargs)[source]#
###
Melspectrogram
``python kapre.time_frequency.Melspectrogram(sr=22050, n_mels=128, fmin=0.0, fmax=None,power_melgram=1.0, return_decibel_melgram=False, trainable_fb=False, **kwargs)
- d
Mel-spectrogram layer that outputs mel-spectrogram(s) in 2D image format.
Its base class is
Spectrogram
.Mel-spectrogram is an efficient representation using the property of human auditory system – by compressing frequency axis into mel-scale axis.
- #### Parameters
sr: integer > 0 [scalar] - sampling rate of the input audio signal. - Default:
22050
n_mels: int > 0 [scalar] - The number of mel bands. - Default:
128
fmin: float > 0 [scalar] - Minimum frequency to include in Mel-spectrogram. - Default:
0.0
fmax: float >
fmin
[scalar] - Maximum frequency to include in Mel-spectrogram. - IfNone
, it is inferred assr / 2
. - Default:None
power_melgram: float [scalar] - Power of
2.0
if power-spectrogram, -1.0
if amplitude spectrogram. - Default:1.0
return_decibel_melgram: bool - Whether to return in decibel or not, i.e. returns log10(amplitude spectrogram) if
True
. - Recommended to useTrue
, although it’s not by default. - Default:False
trainable_fb: bool - Whether the spectrogram -> mel-spectrogram filterbanks are trainable. - If
True
, the frequency-to-mel matrix is initialised with mel frequencies but trainable. - IfFalse
, it is initialised and then frozen. - Default:False
htk: bool - Check out Librosa’s
mel-spectrogram
ormel
option.norm: float [scalar] - Check out Librosa’s
mel-spectrogram
ormel
option.**kwargs: - The keyword arguments of
Spectrogram
such asn_dft
,n_hop
, -padding
,trainable_kernel
,image_data_format
.
- #### Notes
The input should be a 2D array,
(audio_channel, audio_length)
.
- E.g.,
(1, 44100)
for mono signal,(2, 44100)
for stereo signal. It supports multichannel signal input, so
audio_channel
can be any positive integer.The input shape is not related to keras
image_data_format()
config.
#### Returns
- A Keras layer
abs(mel-spectrogram) in a shape of 2D data, i.e.,
(None, n_channel, n_mels, n_time)
ifchannels_first
,(None, n_mels, n_time, n_channel)
ifchannels_last
,
[summary]
- Parameters
n_dft (int, optional) – The number of DFT points. Best if power of 2. Defaults to 512.
n_hop (Optional[int], optional) – Hop length between frames in sample. Best if <=
n_dft
. Defaults to None.padding (str, optional) – Pads signal boundaries (
same
orvalid
). Defaults to ‘same’.power_spectrogram (float, optional) –
2.0
for power,1.0
for amplitude spectrogram. Defaults to 2.0 (power).return_decibel_spectrogram (bool, optional) – Convert spectrogram values to dB. Recommended. Defaults to False.
trainable_kernel (bool, optional) – If True, kernels will be optimized during training. Defaults to False.
image_data_format (str, optional) –
channels_first
orchannels_last
or keras’default
. Defaults to ‘default’.
Notes
The input should be a 2D array,
(audio_channel, audio_length)
. E.g.,(1, 44100)
for mono signals,(2, 44100)
for stereo signals.Supports multichannel inputs, so
audio_channel
can be any positive integer.The input shape is not related to keras
image_data_format()
config.
- Returns
- Keras layer computing the spectrogram
if
channels_first
:(None, n_channel, n_time, n_freq, )
if
channels_last
:(None, n_time, n_freq, n_channel)
- Return type
Layer
- build(input_shape)[source]#
Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of
Layer
orModel
can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution ofcall()
.This is typically used to create the weights of
Layer
subclasses (at the discretion of the subclass implementer).- Parameters
input_shape – Instance of
TensorShape
, or list of instances ofTensorShape
if the layer expects a list of inputs (one instance per input).
- call(x)[source]#
This is where the layer’s logic lives.
The
call()
method may not create state (except in its first invocation, wrapping the creation of variables or other resources intf.init_scope()
). It is recommended to create state in__init__()
, or thebuild()
method that is called automatically beforecall()
executes the first time.- Parameters
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional
inputs
argument is subject to special rules: -inputs
must be explicitly passed. A layer cannot have zeroarguments, and
inputs
cannot be provided via the default value of a keyword argument.NumPy array or Python scalar values in
inputs
get cast as tensors.Keras mask metadata is only collected from
inputs
.Layers are built (
build(input_shape)
method) using shape info frominputs
only.input_spec
compatibility is only checked againstinputs
.Mixed precision input casting is only applied to
inputs
. If a layer has tensor arguments in*args
or**kwargs
, their casting behavior in mixed precision should be handled manually.The SavedModel input specification is generated using
inputs
only.Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for
inputs
and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: -
training
: Boolean scalar tensor of Python boolean indicatingwhether the
call
is meant for training or inference.mask
: Boolean input mask. If the layer’scall()
method takes amask
argument, its default value will be set to the mask generated forinputs
by the previous layer (ifinput
did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns
A tensor or list/tuple of tensors.
- compute_output_shape(input_shape)[source]#
Computes the output shape of the layer.
This method will cause the layer’s state to be built, if that has not happened before. This requires that the layer will later be used with inputs that match the input shape provided here.
- Parameters
input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- Returns
An input shape tuple.
- get_config()[source]#
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by
Network
(one layer of abstraction above).Note that
get_config()
does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.- Returns
Python dictionary.
- class das.kapre.time_frequency.Spectrogram(*args, **kwargs)[source]#
Spectrogram layer returns spectrogram(s).
Examples
>>> kapre.time_frequency.Spectrogram( n_dft=512, n_hop=None, padding='same', power_spectrogram=2.0, return_decibel_spectrogram=False, trainable_kernel=False, image_data_format='default' )
[summary]
- Parameters
n_dft (int, optional) – The number of DFT points. Best if power of 2. Defaults to 512.
n_hop (Optional[int], optional) – Hop length between frames in sample. Best if <=
n_dft
. Defaults to None.padding (str, optional) – Pads signal boundaries (
same
orvalid
). Defaults to ‘same’.power_spectrogram (float, optional) –
2.0
for power,1.0
for amplitude spectrogram. Defaults to 2.0 (power).return_decibel_spectrogram (bool, optional) – Convert spectrogram values to dB. Recommended. Defaults to False.
trainable_kernel (bool, optional) – If True, kernels will be optimized during training. Defaults to False.
image_data_format (str, optional) –
channels_first
orchannels_last
or keras’default
. Defaults to ‘default’.
Notes
The input should be a 2D array,
(audio_channel, audio_length)
. E.g.,(1, 44100)
for mono signals,(2, 44100)
for stereo signals.Supports multichannel inputs, so
audio_channel
can be any positive integer.The input shape is not related to keras
image_data_format()
config.
- Returns
- Keras layer computing the spectrogram
if
channels_first
:(None, n_channel, n_time, n_freq, )
if
channels_last
:(None, n_time, n_freq, n_channel)
- Return type
Layer
- build(input_shape)[source]#
Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of
Layer
orModel
can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution ofcall()
.This is typically used to create the weights of
Layer
subclasses (at the discretion of the subclass implementer).- Parameters
input_shape – Instance of
TensorShape
, or list of instances ofTensorShape
if the layer expects a list of inputs (one instance per input).
- call(x)[source]#
This is where the layer’s logic lives.
The
call()
method may not create state (except in its first invocation, wrapping the creation of variables or other resources intf.init_scope()
). It is recommended to create state in__init__()
, or thebuild()
method that is called automatically beforecall()
executes the first time.- Parameters
inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional
inputs
argument is subject to special rules: -inputs
must be explicitly passed. A layer cannot have zeroarguments, and
inputs
cannot be provided via the default value of a keyword argument.NumPy array or Python scalar values in
inputs
get cast as tensors.Keras mask metadata is only collected from
inputs
.Layers are built (
build(input_shape)
method) using shape info frominputs
only.input_spec
compatibility is only checked againstinputs
.Mixed precision input casting is only applied to
inputs
. If a layer has tensor arguments in*args
or**kwargs
, their casting behavior in mixed precision should be handled manually.The SavedModel input specification is generated using
inputs
only.Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for
inputs
and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: -
training
: Boolean scalar tensor of Python boolean indicatingwhether the
call
is meant for training or inference.mask
: Boolean input mask. If the layer’scall()
method takes amask
argument, its default value will be set to the mask generated forinputs
by the previous layer (ifinput
did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).
- Returns
A tensor or list/tuple of tensors.
- compute_output_shape(input_shape)[source]#
Computes the output shape of the layer.
This method will cause the layer’s state to be built, if that has not happened before. This requires that the layer will later be used with inputs that match the input shape provided here.
- Parameters
input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- Returns
An input shape tuple.
- get_config()[source]#
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by
Network
(one layer of abstraction above).Note that
get_config()
does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.- Returns
Python dictionary.
- das.kapre.time_frequency.conv_output_length(input_length, filter_size, padding, stride, dilation=1)[source]#
Determines output length of a convolution given input length. # Arguments
input_length: integer. filter_size: integer. padding: one of
"same"
,"valid"
,"full"
. stride: integer. dilation: dilation rate, integer.- # Returns
The output length (integer).