Abstract: Cross-scene classification aims to transfer knowledge acquired from a label-rich source domain to an unlabeled target domain with a distribution shift. With the large amount of available ...
Abstract: Vision-Language (VL) alignment across image and text modalities is a challenging task due to the inherent semantic ambiguity of data with multiple possible meanings. Existing methods ...