Maximilian Kannen

•

The Future is Now

Author: Maximilian Kannen

Meta

Segment Anything Model (SAM) was published by Meta last week and it is open source. it can “cut out” any object in an image and find them with a simple text prompt. SAM could be used in future AR software or as part of a bigger AI system with vision capabilities. The new Dataset that they… — read more

Apr 11, 2023
The New Wave of GPT Agents

Since GPT-3.5 and GPT-4 APIs are available many companies and start-ups have implemented them into their products. Now developers have started to do it the other way around. They build systems around GPT-4 to enable it to search, use APIs, execute code, and interact with itself. Examples are HuggingGPT or AutoGPT. They are based on… — read more

Apr 4, 2023
Deepmind follows OpenAI

Similar to OpenAI, Deepmind started to work together with other companies to build more commercial products. In their recent blog post they explained how they developed a new Video codec and improved auto chapters for Youtube. If this trend continues we will see more products for other Alphabet companies developed by Deepmind. — read more

Apr 4, 2023
The AI Index Report 2023

Stanford released the new AI Index Report. Some of the key takeaways are: The report sadly does not include GPT-4 and other newer results. I still highly recommend looking into the report. They did a great job capturing some key trends in a very clear and visual way. For example, the following graph shows the… — read more

Apr 4, 2023
New Biggest Vision Transformer

Google’s new ViT-22B is the largest Vision Transformer model by far, with 22 billion parameters. It has achieved SOTA in numerous benchmarks such as depth estimation, image classification, and semantic segmentation. ViT-22B has been trained on four billion images and can be used for all kinds of computer vision tasks. This result shows that further… — read more

Apr 1, 2023
Giving AI a Body

Meta announced two major advancements toward general-purpose embodied AI agents capable of performing challenging sensorimotor skills. The first advancement is an artificial visual cortex (called VC-1) that supports a diverse range of sensorimotor skills, environments, and embodiments. VC-1 is trained on videos of people performing everyday tasks from the Ego4D dataset. VC-1 matches or outperforms… — read more

Apr 1, 2023
Open Letter to pause bigger AI models

A group of researchers and notable people released an open letter in which they call for a 6 month stop from developing models that are more advanced than GPT-4. Some of the notable names are researchers from competing companies like Deepmind, Google, and Stability AI like Victoria Krakovna, Noam Shazeer, and Emad Mostaque. But also… — read more

Mar 29, 2023
Cerebras releases 7 open LLMs

Cerebras, a hardware company that produces large chips designed for machine learning, released 7 open models ranging from 111 million to 13 billion parameters. all of them are chinchilla aligned and fully open, unlike the LaMA models by Meta. While this is mostly a marketing stunt to show the efficiency of their chips, it is… — read more

Mar 29, 2023
Listen to OpenAI

Many people saw the new episode of the Lex Friedman Podcast with Sam Altman, where he talks about some social and political implications of GPT-4. But fewer people saw the podcast with Ilya Sutskever, the Chief Scientist at OpenAI, which is way more technical and in my opinion even more exciting and enjoyable. I really… — read more

Mar 28, 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4

Microsoft researchers have conducted an investigation on an early version of OpenAI’s GPT-4, and they have found that it exhibits more general intelligence than previous AI models. The model can solve novel and difficult tasks spanning mathematics, coding, vision, medicine, law, psychology, and more, without needing any special prompting. Furthermore, in all of these tasks,… — read more

Mar 25, 2023
Learning to Grow Pretrained Models for Efficient Transformer Training

A new research paper proposes a method to accelerate the training of large-scale transformers, called the Linear Growth Operator (LiGO). By utilizing the parameters of smaller, pre-trained models to initialize larger models, LiGO can save up to 50% of the computational cost of training from scratch while achieving better performance. This approach could have important… — read more

Mar 24, 2023
ChatGPT’s biggest update jet

OpenAI announced that they will introduce plugins to ChatGPT. Two of them developed by OpenAi themself allow the model to search the web for information and run generated python code. Other third-party plugins like Wolfram allow the model to use other APIs to perform certain tasks. the future capabilities of a model enhanced this way… — read more

Mar 23, 2023
GitHub announced Copilot X

After Copilot became inferior to GPT-4, they finally announced a set of new functionalities based on GPT-4, like Generated pull requests, answering questions about code or documentation, and helping with coding. — read more

Mar 22, 2023
Google opens Bard

Google’s GPT alternative Bard is now available in the US and UK. Early testers already speak out in favor of Bing which also launched image generation this week. Bard is based on LaMDA, an older Language model that is not as capable as GPT-4. — read more

Mar 22, 2023
Nvidia goes big in AI

Right now the GTC 2023 is going on and Nvidia showed off some of their newest steps in AI including this amazing Intro. They introduced cuLitho, a new tool to optimize the design of processors. This was a complicated process that took weeks to calculate and can now be done in a few hours. Speeding… — read more

Mar 22, 2023
From GPT-4 to Proto-AGI

Deutsche Version Artificial General Intelligence (AGI) is the ultimate goal of many AI researchers and enthusiasts. It refers to the ability of a machine to perform any intellectual task that a human can do, such as reasoning, learning, creativity, and generalization. However, we are still far from achieving AGI with our current AI systems. One… — read more

Mar 20, 2023
GPTs are GPTs: How Large Language Models Could Transform the U.S. Labor Market

A new study by OpenAI and the University of Pennsylvania investigates the potential impact of Generative Pre-trained Transformer (GPT) models on the U.S. labor market. The paper, titled “GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models,” assesses occupations based on their correspondence with GPT capabilities, using both… — read more

Mar 20, 2023
FlexGen Enables High-Throughput Inference of Large Language Models on Single GPUs

FlexGen is a new generation engine that enables high-throughput inference of large language models on a single commodity GPU. It uses a linear programming optimizer to efficiently store and access tensors and compresses weights and attention cache to 4 bits. FlexGen achieves significantly higher throughput than state-of-the-art offloading systems, reaching a generation throughput of 1… — read more

Mar 20, 2023
New Transformer Model CoLT5 Processes Long Documents Faster and More Efficiently than Previous Models

Researchers from several institutions, including the University of California, Berkeley, and Facebook AI Research, have developed a new transformer model that can process long documents faster and more efficiently than previous models. The team’s paper, titled “CoLT5: Faster Long-Range Transformers with Conditional Computation,” describes a transformer model that uses conditional computation to devote more resources… — read more

Mar 20, 2023
New Speech Recognition Model by AssemblyAi

AssembyAi added a new speech recognition model to their products. Conformer-1 is “a state-of-the-art speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data.” It combines convolutional networks with transformers to archive never seen scores on various recognition tasks. — read more

Mar 18, 2023