16. An Image is Worth 16x16 Words Transformers for Image Recognition at Scale (ViT)