Existing solutions rely heavily on sequential processing pipelines that convert speech into text and then pass it to language ...