Abstract: The wideband signal detection framework which applies deep learning-based object detection networks to wideband spectrograms for joint signal detection, classification, and time-frequency ...
Abstract: In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) to attend to the correlation of frequency (y-axis) with time ...
This module provides CLI scripts for training, inference, and dataset preparation for bioacoustics classification. The core functionality (models, datasets, utilities) is provided by the ...
frame_rate (int): The frame rate per second of the video. Default: 30. sample_rate (int): The sample rate for audio sampling. Default: 16000. num_mels (int): Number of channels of the melspectrogram.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results