Abstract: Aiming to achieve class-agnostic perception in visual object tracking, current trackers commonly formulate tracking as a one-shot detection problem with the template-matching architecture.
Abstract: Referring Multi-Object Tracking (RMOT) aims to dynamically track an arbitrary number of referred targets in a video sequence according to the language expression. Previous methods mainly ...