Google released the Universal Speech Model (USM), which can transcribe over 300 languages. It outperforms the state-of-the-art model Whisper in the 18 languages that Whisper supports. This is part of Google’s plan to support the 1000 most spoken languages. The model is with 2B parameters slightly bigger than Whisper and was pre-trained mostly on unlabeled data.
Leave a Reply