-
Text-to-Speech is reaching a critical point
Today, Microsoft published a paper called “NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers“. In this paper, they show a new text-to-speech model which is not only able to copy human speech, but also singing. The model uses a latent diffusion model and neural audio codec to synthesize high-quality, expressive — read more
-
MiniGPT-4 is an Open-Source Multimodal Model
MiniGPT-4, is an open-source multimodal model similar to the version of GPT-4 that was shown during OpenAI’s presentation. It combines a Visual encoder with an LLM. They used Vicuna which is a fine-tuned version of LLaMA. In the future, I hope more teams try to add new ideas to their models instead of creating more — read more
-
OpenAssistent is here
OpenAssistent is an open-source project to build a personal assistant. They just released their first model. you can try it out here. While the progress on smaller models by the open-source community is impressive there are a few things I want to mention. Many advertise these models as local alternatives to chatGPT or even compare — read more
-
Zip-NeRF: the next step towards the Metaverse
Neural Radiance Fields (NeRFs), which are used for synthesizing high-quality images of 3D scenes are a class of generative models that learn to represent scenes as continuous volumetric functions, mapping 3D spatial coordinates to RGB colors and volumetric density. Grid-based representations of NeRFs use a discretized grid to approximate this continuous function, which allows for — read more
-
New Image generation approach
OpenAI developed a new approach to image generation called consistency models. Current models, like Dalle-2 or stable diffusion, iteratively diffuse the result. This new approach goes straight to the final result which makes the process way faster and cheaper. While not as good as some diffusion models yet, they will likely improve and become an — read more
-
Stanford and Google let AI roleplay
In a new research paper, Google and Stanford University created a sandbox world where they let 25 AI agents role-play. The agents are based on chatGPT-3.5 and behave more believably than real humans. Future agents based on GPT-4 will be able to act even more realistically and intelligently. This could not only mean that we — read more
-
The New Wave of GPT Agents
Since GPT-3.5 and GPT-4 APIs are available many companies and start-ups have implemented them into their products. Now developers have started to do it the other way around. They build systems around GPT-4 to enable it to search, use APIs, execute code, and interact with itself. Examples are HuggingGPT or AutoGPT. They are based on — read more
-
Deepmind follows OpenAI
Similar to OpenAI, Deepmind started to work together with other companies to build more commercial products. In their recent blog post they explained how they developed a new Video codec and improved auto chapters for Youtube. If this trend continues we will see more products for other Alphabet companies developed by Deepmind. — read more
-
The AI Index Report 2023
Stanford released the new AI Index Report. Some of the key takeaways are: The report sadly does not include GPT-4 and other newer results. I still highly recommend looking into the report. They did a great job capturing some key trends in a very clear and visual way. For example, the following graph shows the — read more
-
New Biggest Vision Transformer
Google’s new ViT-22B is the largest Vision Transformer model by far, with 22 billion parameters. It has achieved SOTA in numerous benchmarks such as depth estimation, image classification, and semantic segmentation. ViT-22B has been trained on four billion images and can be used for all kinds of computer vision tasks. This result shows that further — read more
-
Open Letter to pause bigger AI models
A group of researchers and notable people released an open letter in which they call for a 6 month stop from developing models that are more advanced than GPT-4. Some of the notable names are researchers from competing companies like Deepmind, Google, and Stability AI like Victoria Krakovna, Noam Shazeer, and Emad Mostaque. But also — read more
-
Cerebras releases 7 open LLMs
Cerebras, a hardware company that produces large chips designed for machine learning, released 7 open models ranging from 111 million to 13 billion parameters. all of them are chinchilla aligned and fully open, unlike the LaMA models by Meta. While this is mostly a marketing stunt to show the efficiency of their chips, it is — read more
-
Listen to OpenAI
Many people saw the new episode of the Lex Friedman Podcast with Sam Altman, where he talks about some social and political implications of GPT-4. But fewer people saw the podcast with Ilya Sutskever, the Chief Scientist at OpenAI, which is way more technical and in my opinion even more exciting and enjoyable. I really — read more
-
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Microsoft researchers have conducted an investigation on an early version of OpenAI’s GPT-4, and they have found that it exhibits more general intelligence than previous AI models. The model can solve novel and difficult tasks spanning mathematics, coding, vision, medicine, law, psychology, and more, without needing any special prompting. Furthermore, in all of these tasks, — read more
-
Learning to Grow Pretrained Models for Efficient Transformer Training
A new research paper proposes a method to accelerate the training of large-scale transformers, called the Linear Growth Operator (LiGO). By utilizing the parameters of smaller, pre-trained models to initialize larger models, LiGO can save up to 50% of the computational cost of training from scratch while achieving better performance. This approach could have important — read more
-
ChatGPT’s biggest update jet
OpenAI announced that they will introduce plugins to ChatGPT. Two of them developed by OpenAi themself allow the model to search the web for information and run generated python code. Other third-party plugins like Wolfram allow the model to use other APIs to perform certain tasks. the future capabilities of a model enhanced this way — read more
-
GitHub announced Copilot X
After Copilot became inferior to GPT-4, they finally announced a set of new functionalities based on GPT-4, like Generated pull requests, answering questions about code or documentation, and helping with coding. — read more
-
Google opens Bard
Google’s GPT alternative Bard is now available in the US and UK. Early testers already speak out in favor of Bing which also launched image generation this week. Bard is based on LaMDA, an older Language model that is not as capable as GPT-4. — read more
-
Nvidia goes big in AI
Right now the GTC 2023 is going on and Nvidia showed off some of their newest steps in AI including this amazing Intro. They introduced cuLitho, a new tool to optimize the design of processors. This was a complicated process that took weeks to calculate and can now be done in a few hours. Speeding — read more