Abstract: In recent years, vision-language tracking has drawn emerging attention in the tracking field. The critical challenge for the task is to fuse semantic representations of language information ...
Abstract: Significant progress in video question answering (VideoQA) have been made thanks to thriving large image-language pretraining frameworks. Although image-language models can efficiently ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results