Maximilian Kannen

•

The Future is Now

Category: news

Microsoft Improves Bing Chat Again

Microsoft announced, that not only Bing Chat is now available for everyone but also that Bing Chat will get new features such as image search, and more ways to present visual information. They also add the ability to summarise PDFs and other types of content. But the biggest news is that they bring plugins to… — read more

May 4, 2023
DeepFloyd is finally here

Stability AI finally released DeepFloyd, a new text-to-image model, which is capable of putting text in images and has a much better spacial awareness. It was trained on a new version of the LAION-A dataset. Test it out here — read more

May 2, 2023
Study Extends BERT’s Context Length to 2 Million Tokens

Researchers have made a breakthrough in the field of artificial intelligence, successfully extending the context length of BERT, a Transformer-based natural language processing model, to two million tokens. The team achieved this feat by incorporating a recurrent memory into BERT using the Recurrent Memory Transformer (RMT) architecture. The researchers’ method increases the model’s effective context… — read more

Apr 24, 2023
Google and DeepMind Team Up

Google and DeepMind just announced that they will unite Google Brain and Deepmind into Google DeepMind. This is a good step for both sites since Deepmind really needs the computing power of Google to make further progress on AGI and Google needs the Manpower and knowledge of the Deepmind team to quickly catch up to… — read more

Apr 20, 2023
The next open-source LLM

Stability-AI finally released their own open-source language model. It is trained from scratch and can be used commercially. The first two models are 3B and 7B parameters in size, which is comparable to many other open-source models. What I am more excited about are their planned 65B and 175B parameter models which are bigger than… — read more

Apr 19, 2023
NVIDIA improves text-to-video jet again

NVIDIA’s newest model, VideoLDM can generate videos with resolutions up to 1280 x 2048. They archive that by training a diffusion model in a compressed latent space, introducing a temporal dimension to the latent space, and fine-tuning on encoded image sequences while temporally aligning diffusion model upsamplers. It is visibly better than previous models and… — read more

Apr 19, 2023
Text-to-Speech is reaching a critical point

Today, Microsoft published a paper called “NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers“. In this paper, they show a new text-to-speech model which is not only able to copy human speech, but also singing. The model uses a latent diffusion model and neural audio codec to synthesize high-quality, expressive… — read more

Apr 19, 2023
MiniGPT-4 is an Open-Source Multimodal Model

MiniGPT-4, is an open-source multimodal model similar to the version of GPT-4 that was shown during OpenAI’s presentation. It combines a Visual encoder with an LLM. They used Vicuna which is a fine-tuned version of LLaMA. In the future, I hope more teams try to add new ideas to their models instead of creating more… — read more

Apr 18, 2023
Nahaufnahme vom Gehirn

Researchers at Duke’s Center for In Vivo Microscopy, in collaboration with other institutions, have achieved a breakthrough in magnetic resonance imaging (MRI) technology, capturing the highest resolution images ever of a mouse brain. Using an incredibly powerful 9.4 Tesla magnet, 100 times stronger gradient coils than those used in clinical MRIs, and a high-performance computer,… — read more

Apr 18, 2023
OpenAssistent is here

OpenAssistent is an open-source project to build a personal assistant. They just released their first model. you can try it out here. While the progress on smaller models by the open-source community is impressive there are a few things I want to mention. Many advertise these models as local alternatives to chatGPT or even compare… — read more

Apr 15, 2023
Zip-NeRF: the next step towards the Metaverse

Neural Radiance Fields (NeRFs), which are used for synthesizing high-quality images of 3D scenes are a class of generative models that learn to represent scenes as continuous volumetric functions, mapping 3D spatial coordinates to RGB colors and volumetric density. Grid-based representations of NeRFs use a discretized grid to approximate this continuous function, which allows for… — read more

Apr 14, 2023
New Image generation approach

OpenAI developed a new approach to image generation called consistency models. Current models, like Dalle-2 or stable diffusion, iteratively diffuse the result. This new approach goes straight to the final result which makes the process way faster and cheaper. While not as good as some diffusion models yet, they will likely improve and become an… — read more

Apr 13, 2023
Stanford and Google let AI roleplay

In a new research paper, Google and Stanford University created a sandbox world where they let 25 AI agents role-play. The agents are based on chatGPT-3.5 and behave more believably than real humans. Future agents based on GPT-4 will be able to act even more realistically and intelligently. This could not only mean that we… — read more

Apr 11, 2023
Meta

Segment Anything Model (SAM) was published by Meta last week and it is open source. it can “cut out” any object in an image and find them with a simple text prompt. SAM could be used in future AR software or as part of a bigger AI system with vision capabilities. The new Dataset that they… — read more

Apr 11, 2023
The New Wave of GPT Agents

Since GPT-3.5 and GPT-4 APIs are available many companies and start-ups have implemented them into their products. Now developers have started to do it the other way around. They build systems around GPT-4 to enable it to search, use APIs, execute code, and interact with itself. Examples are HuggingGPT or AutoGPT. They are based on… — read more

Apr 4, 2023
Deepmind follows OpenAI

Similar to OpenAI, Deepmind started to work together with other companies to build more commercial products. In their recent blog post they explained how they developed a new Video codec and improved auto chapters for Youtube. If this trend continues we will see more products for other Alphabet companies developed by Deepmind. — read more

Apr 4, 2023
The AI Index Report 2023

Stanford released the new AI Index Report. Some of the key takeaways are: The report sadly does not include GPT-4 and other newer results. I still highly recommend looking into the report. They did a great job capturing some key trends in a very clear and visual way. For example, the following graph shows the… — read more

Apr 4, 2023
New Biggest Vision Transformer

Google’s new ViT-22B is the largest Vision Transformer model by far, with 22 billion parameters. It has achieved SOTA in numerous benchmarks such as depth estimation, image classification, and semantic segmentation. ViT-22B has been trained on four billion images and can be used for all kinds of computer vision tasks. This result shows that further… — read more

Apr 1, 2023
Giving AI a Body

Meta announced two major advancements toward general-purpose embodied AI agents capable of performing challenging sensorimotor skills. The first advancement is an artificial visual cortex (called VC-1) that supports a diverse range of sensorimotor skills, environments, and embodiments. VC-1 is trained on videos of people performing everyday tasks from the Ego4D dataset. VC-1 matches or outperforms… — read more

Apr 1, 2023
Open Letter to pause bigger AI models

A group of researchers and notable people released an open letter in which they call for a 6 month stop from developing models that are more advanced than GPT-4. Some of the notable names are researchers from competing companies like Deepmind, Google, and Stability AI like Victoria Krakovna, Noam Shazeer, and Emad Mostaque. But also… — read more

Mar 29, 2023