Vision application

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias of locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can operate by …