-
New Biggest Vision Transformer
Google’s new ViT-22B is the largest Vision Transformer model by far, with 22 billion parameters. It has achieved SOTA in numerous benchmarks such as depth estimation, image classification, and semantic segmentation. ViT-22B has been trained on four billion images and can be used for all kinds of computer vision tasks. This result shows that further — read more