Blender Tutorial Audio Visual Using Particles

Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation

Abstract: In this paper, we explore the cross-modal adaptation of pre-trained Vision Transformers (ViTs) for the audio-visual domain by incorporating a limited set of trainable parameters. To this end ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation

Trending now