Abstract: Recently, ViTAE-RVSA, the first large-scale Vision Transformer (ViT) tailored for remote sensing, has demonstrated the potential of ViTs by integrating window attention with a convolutional ...