Abstract: Visual Language Tracking (VLT) enables machines to perform tracking in real world through human-like language descriptions. However, existing VLT methods are limited to 2D spatial tracking ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results