In dieser Episode reden Florian und Ich über das neue Model o1 und was es besonders macht. Außerdem reden wir über den Hardwaremarkt, Alpha Proteo und die US Politik.
Episode 52: Jubiläums Folge! Ein Jahr Words of The Future
/
RSS Feed
Share
Link
Embed
Words of the Future wird ein Jahr alt! Florian und Ich schauen zurück auf das Jahr und machen Vorhersagen für das Nächste. Das ist auch die letzte Folge die im wöchentlichen Format kommt. Ab jetzt beginnt Staffel 2 und das bedeutet unregelmäßigere Folgen (etwa ein mal pro Monat), aber dafür längere Folgen und nur noch absolute Highlights und spannende Themen mit hoffentlich mehr Gästen.
Episode 46: 1-Bit LLMs, Stable Diffusion 3, und Mistral Modelle
/
RSS Feed
Share
Link
Embed
In dieser Folge reden Florian und Max über den Klarna Assistenten, Stable Diffusion 3 und die neuen Mistral Modelle. Sorry für die Verspätung der Folge.
Google Deepmind just released their new Gemini models. They come in 3 sizes. Nano will be used on devices like the Pixel phones, and Pro will be used in their products such as Bard, and Ultra is going to be released at the beginning of next year. The models are multimodal and can input, audio, video, text, images, and code.
It outperforms current state-of-the-art models not only in text-based tasks but also in other modalities.
Test the Pro version now in Bard and read more about the model here and here.
LLMs are powerful tools, but they often struggle with tasks that require logical and algorithmic reasoning, such as arithmetic. A team of researchers from Google has developed a new technique to teach LLMs how to perform arithmetic operations by using in-context learning and algorithmic prompting. Algorithmic prompting means that the model is given detailed explanations of each step of the algorithm, such as addition or multiplication. The researchers showed that this technique can improve the performance of LLMs on arithmetic problems that are much harder than those seen in the examples. They also demonstrated that LLMs can use algorithmic reasoning to solve complex word problems by interacting with other models that have different skills. This work suggests that LLMs can learn algorithmic reasoning as a skill and apply it to various tasks.
Meta recently released their new Llama models. The new models come in sizes from 7 to 70 billion parameters and are released as base models and chat models, which are fine-tuned with two separate reward models for safety and helpfulness. While the models are only a small improvement over the old Llama models, the most important change is the license which now allows commercial use.
Researchers at Microsoft have unveiled Kosmos-2 the successor of Kosmos-1, a Multimodal Large Language Model (MLLM) that integrates the capability of perceiving object descriptions and grounding text in the visual world. By representing refer expressions as links in Markdown format, Kosmos-2 achieves the vital task of grounding text to visual elements, enabling multimodal grounding, referring expression comprehension and generation, perception-language tasks, and language understanding and generation. This milestone in the development of artificial general intelligence lays the foundation for Embodiment AI and the convergence of language, multimodal perception, action, and world modeling, bringing us closer to bridging the gap between humans and machines and revolutionizing various domains where AI interacts with the real world. With just 1.6B parameters, the model is quite small and will be available open on GitHub
Deepmind published a new blog post where they present their newest AI which is based on their previous work Gato. RoboCat is a self-improving AI agent for robotics that learns to perform a variety of tasks across different arms and then self-generates new training data to improve its technique. It is the first agent to solve and adapt to multiple tasks and do so across different, real robots. RoboCat learns much faster than other state-of-the-art models. It can pick up a new task with as few as 100 demonstrations because it draws from a large and diverse dataset. This capability will help accelerate robotics research, as it reduces the need for human-supervised training, and is an important step towards creating a general-purpose robot.
Voiceboxis a new generative AI for speech that can generalize to speech-generation tasks it was not specifically trained to accomplish with state-of-the-art performance. It can create outputs in a vast variety of styles, from scratch or from a sample, and it can modify any part of a given sample. It can also perform tasks such as:
In-context text-to-speech synthesis: Using a short audio segment, it can match its style and generate text.
Cross-lingual style transfer: Given a sample of speech and a passage of text in six languages, it can produce a reading of the text in that language.
Speech denoising and editing: It can resynthesize or replace corrupted segments within audio recordings.
Diverse speech sampling: It can generate speech that is more representative of how people talk in the real world.
Voicebox uses a new approach called Flow Matching, which learns from raw audio and transcription without requiring specific training for each task. It also uses a highly effective classifier to distinguish between authentic speech and audio generated with Voicebox. Voicebox outperforms the current state-of-the-art English model VALL-E on zero-shot text-to-speech and cross-lingual style transfer and achieves new state-of-the-art results on word error rate and audio similarity. Voicebox is not publicly available because of the potential risks of misuse, but the researchers have shared audio samples and a research paper detailing the approach and results. They hope to see a similar impact for speech as for other generative AI domains in the future.
After earlier experiments on mice, it is now possible to create human embryos out of stem cells. This allows us to make human life without sperm or eggs. Since the experiments are limited by ethical concerns they stopped the growth of the embryo at an early stage. This research could lead to a better understanding of early development and could allow us someday to design our successor species.