In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results