Each sequence is presented as an arc or custom track, with length proportionally mapped. Colored ribbons represent alignment regions between sequences, supporting coloring by similarity or source.
Abstract: We present SegINR, a novel approach to neural Text-to-Speech (TTS) that eliminates the need for either an auxiliary duration predictor or autoregressive (AR) sequence modeling for alignment.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results