Microsoft released the paper “Language Is Not All You Need: Aligning Perception with Language Models “, where they introduce their multimodal large language model KOSMOS-1. KOSMOS-1 is still a language model at its core, but it can also use other training data, like images. It shows impressive results in a number of tasks, such as image transcription. It is, therefore, a much more general model than a simple language model and I think this is a step in the right direction for AGI since I believe that language alone is not enough for AGI.