Microsoft released the paper “Language Is Not All You Need: Aligning Perception with Language Models “, where they introduce their multimodal large language model KOSMOS-1. KOSMOS-1 is still a language model at its core, but it can also use other training data, like images. It shows impressive results in a number of tasks, such as image transcription. It is, therefore, a much more general model than a simple language model and I think this is a step in the right direction for AGI since I believe that language alone is not enough for AGI.

One response to “Microsoft published KOSMOS-1, a multimodal large language model”

  1. […] at Microsoft have unveiled Kosmos-2 the successor of Kosmos-1, a Multimodal Large Language Model (MLLM) that integrates the capability of perceiving object […]

Leave a Reply

Your email address will not be published. Required fields are marked *