A Patch-Level Region-Aware Module with a Multi-Label Framework for Remote Sensing Image Captioning
Recent Transformer-based works can generate high-quality captions for remote sensing images (RSIs).However, these methods generally feed global or grid visual features to a Transformer-based captioning model for associating cross-modal information, which limits performance.In this Shimmer Glass Can work, we investigate unexplored ideas for remote s