Abstract: This paper introduces V2Coder, a non-autoregressive vocoder based on hierarchical variational autoencoders (VAEs). The hierarchical VAE with hierarchically extended prior and approximate ...
Abstract: We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder ...
Despite significant advances in neural vocoders using diffusion models and their variants, these methods, unfortunately, inherently suffer from a performance-inference dilemma, which stems from the ...
GLM-TTS is a high-quality text-to-speech (TTS) synthesis system based on large language models, supporting zero-shot voice cloning and streaming inference. This system adopts a two-stage architecture: ...