Abstract: Vision and language understanding is one of the most fundamental and difficult tasks in Multimedia Intelligence. Simultaneously Visual Question Answering (VQA) is even more challenging since ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results