Abstract: Visual grounding, i.e., localizing objects in images ac-cording to natural language queries, is an important topic in visual language understanding. The most effective approaches for this ...