PaLM-E has 562B parameters which make it one of the largest models today. It combines sensory data from a robot with text and image data. It is based on PaLM and was fine-tuned on input & scene representations for different sensor modalities. These kinds of more general models are the way to more powerful and intelligent systems that will assist us in the next few years.

One response to “Google presents PaLM-E. An Embodied Multimodal Language Model”

  1. […] that GPT-4 will be able to work with video data, which implies a multimodal model comparable to PaLM-E. Read more […]

Leave a Reply

Your email address will not be published. Required fields are marked *