The Future is Now

Tag: news (Page 3 of 4)

OpenAssistent is here

OpenAssistent is an open-source project to build a personal assistant. They just released their first model. you can try it out here.

announcement video

While the progress on smaller models by the open-source community is impressive there are a few things I want to mention. Many advertise these models as local alternatives to chatGPT or even compare them to GPT-4. This is sadly not true. it is not possible to replicate the capabilities of a model like GPT-4 on a local machine; at least not yet. This does not mean that they are not good. many of them are able to generate good answers or even use APIs like chatGPT.

Zip-NeRF: the next step towards the Metaverse

Neural Radiance Fields (NeRFs), which are used for synthesizing high-quality images of 3D scenes are a class of generative models that learn to represent scenes as continuous volumetric functions, mapping 3D spatial coordinates to RGB colors and volumetric density. Grid-based representations of NeRFs use a discretized grid to approximate this continuous function, which allows for efficient training and rendering. However, these grid-based approaches often suffer from aliasing artifacts, such as jaggies or missing scene content, due to the lack of explicit understanding of scale.

This new paper proposes a novel technique called Zip-NeRF that combines ideas from rendering and signal processing to address the aliasing issue in grid-based NeRFs. This allows for anti-aliasing in grid-based NeRFs, resulting in significantly lower error rates compared to previous techniques. Moreover, Zip-NeRF achieves faster training times, being 22 times faster than current approaches.

This makes them applicable for VR and AR applications and allows for high-quality 3d scenes. Next year when the Hardware improves we will see some very high-quality VR experiences.

New Image generation approach

OpenAI developed a new approach to image generation called consistency models. Current models, like Dalle-2 or stable diffusion, iteratively diffuse the result. This new approach goes straight to the final result which makes the process way faster and cheaper. While not as good as some diffusion models yet, they will likely improve and become an alternative for scenarios where faster results are needed.

Stanford and Google let AI roleplay

In a new research paper, Google and Stanford University created a sandbox world where they let 25 AI agents role-play. The agents are based on chatGPT-3.5 and behave more believably than real humans. Future agents based on GPT-4 will be able to act even more realistically and intelligently. This could not only mean that we get better AI NPCs in computer games, but it also means that we will not be able to distinguish bots from real people. This is a great danger in a world where public opinions influence many. As these agents become more human-like, the risk of deep emotional connections increases, especially if the person does not know that they are interacting with an AI.

Meta

Segment Anything Model (SAM) was published by Meta last week and it is open source. it can “cut out” any object in an image and find them with a simple text prompt. SAM could be used in future AR software or as part of a bigger AI system with vision capabilities. The new Dataset that they used (SA-1B) is also open source and contains over 1B masked images.

The New Wave of GPT Agents

Since GPT-3.5 and GPT-4 APIs are available many companies and start-ups have implemented them into their products. Now developers have started to do it the other way around. They build systems around GPT-4 to enable it to search, use APIs, execute code, and interact with itself. Examples are HuggingGPT or AutoGPT. They are based on works like Toolformer or this result. Even Microsoft itself started to build LLM-Augmenter around GPT-4 to improve its performance.

I talked about this development in my post on how to get from GPT-4 to proto-AGI. I still think that this is the way to a general assistant even though I am not sure if GPT-4 is already capable enough or if we need another small improvement.

Deepmind follows OpenAI

Similar to OpenAI, Deepmind started to work together with other companies to build more commercial products. In their recent blog post they explained how they developed a new Video codec and improved auto chapters for Youtube.

If this trend continues we will see more products for other Alphabet companies developed by Deepmind.

The AI Index Report 2023

Stanford released the new AI Index Report. Some of the key takeaways are:

  • The Industry takes over and leaves academia behind.
  • Scientific research is accelerating thanks to AI.
  • Misuse and use of Ai are rapidly growing.
  • Demand for AI-related skills is growing
  • Companies that use AI are leaving behind those who do not.
  • China is the most active country in machine learning and also the most positive about AI.
  • The USA is building the most powerful AI systems.

The report sadly does not include GPT-4 and other newer results. I still highly recommend looking into the report. They did a great job capturing some key trends in a very clear and visual way. For example, the following graph shows the exponential growth of machine learning systems.

New Biggest Vision Transformer

Google’s new ViT-22B is the largest Vision Transformer model by far, with 22 billion parameters. It has achieved SOTA in numerous benchmarks such as depth estimation, image classification, and semantic segmentation. ViT-22B has been trained on four billion images and can be used for all kinds of computer vision tasks.

This result shows that further scaling in vision transformers can be as valuable as it was for Language Models. This also indicated that future multimodal models can be improved and GPT-4 is not the limit.

Giving AI a Body

Meta announced two major advancements toward general-purpose embodied AI agents capable of performing challenging sensorimotor skills.

The first advancement is an artificial visual cortex (called VC-1) that supports a diverse range of sensorimotor skills, environments, and embodiments. VC-1 is trained on videos of people performing everyday tasks from the Ego4D dataset. VC-1 matches or outperforms sota results on 17 different sensorimotor tasks in virtual environments.

The second advancement is a new approach called adaptive (sensorimotor) skill coordination (ASC), which achieves near-perfect performance (98 percent success) on the challenging task of robotic mobile manipulation (navigating to an object, picking it up, navigating to another location, placing the object, repeating) in physical environments.

These improvements are needed to move the field of robotics forward and match the current pace in AI which will need bodies at some point.

Open Letter to pause bigger AI models

A group of researchers and notable people released an open letter in which they call for a 6 month stop from developing models that are more advanced than GPT-4. Some of the notable names are researchers from competing companies like Deepmind, Google, and Stability AI like Victoria Krakovna, Noam Shazeer, and Emad Mostaque. But also some professors and authors like Stuart Russell or Peter Warren. The main concern is the lack of control and understanding of these systems and the potential risks that go from misinformation to human extinction.

Alles Denkbare wird einmal gedacht. Jetzt oder in der Zukunft. Was Solomo gefunden hat, kann einmal auch ein anderer finden, […]. / Everything that is conceivable will be thought of at some point. Whether now or in the future. What Solomon has found, another may also find someday […].

Dürrenmatt, Die Physiker

Although I recognize some valid concerns in the letter, I personally disagree with them. As demonstrated in Dürrenmatt’s novel “The Physicists,” technology, no matter how dangerous, cannot be hindered or halted and will always advance. Even if OpenAI were to stop developing GPT-5, other nations would continue to do so, akin to nuclear weapons, which do not provide any benefits. However, AI possesses enormous potential for good, making it difficult to argue against its development. While there is a possibility of AI causing harm, preventing or slowing its progress would prevent billions of people from being aided by its potential benefits. I believe that the risk of a negative outcome is acceptable if it allows us to solve most of our issues. Especially since it looks like right now that a negative outcome is guaranteed without AI, as the climate crises and global conflicts arise.

Cerebras releases 7 open LLMs

Cerebras, a hardware company that produces large chips designed for machine learning, released 7 open models ranging from 111 million to 13 billion parameters. all of them are chinchilla aligned and fully open, unlike the LaMA models by Meta. While this is mostly a marketing stunt to show the efficiency of their chips, it is also great news for the open-source community who will use the models to develop a lot of cool new stuff.

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Microsoft researchers have conducted an investigation on an early version of OpenAI’s GPT-4, and they have found that it exhibits more general intelligence than previous AI models. The model can solve novel and difficult tasks spanning mathematics, coding, vision, medicine, law, psychology, and more, without needing any special prompting. Furthermore, in all of these tasks, GPT-4‘s performance is strikingly close to human-level performance and often vastly surpasses prior models. The researchers believe that GPT-4 could be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. This is in line with my own experience and shows that we are closer to AGI than we thought.

The study emphasizes the need to discover the limitations of such models and the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. The study concludes with reflections on the societal implications of the recent technological leap and future research directions.

Learning to Grow Pretrained Models for Efficient Transformer Training

A new research paper proposes a method to accelerate the training of large-scale transformers, called the Linear Growth Operator (LiGO). By utilizing the parameters of smaller, pre-trained models to initialize larger models, LiGO can save up to 50% of the computational cost of training from scratch while achieving better performance. This approach could have important implications for the field of AGI by enabling more efficient and effective training methods for large-scale models, and potentially leading to more flexible and adaptable models that can learn to grow and evolve over time. If this is already used to train GPT-5 it could mean that we get GPT-5 earlier than expected.

ChatGPT’s biggest update jet

OpenAI announced that they will introduce plugins to ChatGPT. Two of them developed by OpenAi themself allow the model to search the web for information and run generated python code. Other third-party plugins like Wolfram allow the model to use other APIs to perform certain tasks. the future capabilities of a model enhanced this way are limitless. I talked about this development in my Post “From GPT-4 to Proto-AGI” where I predicted this development. If the capability to run generated code is not too limited, I would call this Proto-AGI.

Google opens Bard

Google’s GPT alternative Bard is now available in the US and UK. Early testers already speak out in favor of Bing which also launched image generation this week. Bard is based on LaMDA, an older Language model that is not as capable as GPT-4.

Nvidia goes big in AI

Right now the GTC 2023 is going on and Nvidia showed off some of their newest steps in AI including this amazing Intro.

They introduced cuLitho, a new tool to optimize the design of processors. This was a complicated process that took weeks to calculate and can now be done in a few hours. Speeding up the chip design will lead to a speedup of the entire industry and shows how positive feedback loops power exponential growth.

They also talked about their new H100 chips for their DGX supercomputers. These chips will not only power the servers of big AI players like Aws, Azure, and OpenAI, but also Nvidias own cloud servers, which will be available for smaller companies.

Part of this Cloud service will be Nvidia cloud foundation will provide pre-trained models for text, image, and protein-sequencing and will run the training and interference of the models. One of the first users is Adobe, which uses the service for its new AI service Firefly.

In the end, they also presented a new server CPU “Grace” and the Bluefield-3 DPU which will power future data centers.

I am most impressed by their hardware improvements and their AI cloud platform which will both accelerate Ai adoption greatly.

GPTs are GPTs: How Large Language Models Could Transform the U.S. Labor Market

A new study by OpenAI and the University of Pennsylvania investigates the potential impact of Generative Pre-trained Transformer (GPT) models on the U.S. labor market. The paper, titled “GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models,” assesses occupations based on their correspondence with GPT capabilities, using both human expertise and classifications from GPT-4. The study finds that approximately 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of GPTs, while around 19% of workers may see at least 50% of their tasks impacted. The impact spans all wage levels, with higher-income jobs potentially facing greater exposure. The paper concludes that GPTs exhibit characteristics of general-purpose technologies, which could have significant economic, social, and policy implications. This comes to no surprise for everyone who used GPT-4 or watched the recent Microsoft announcment.

I discussed this topic in more depth in my book review of “A World Without Work”. This research supports the author’s point and indicates a radical shift in the economy in the coming years. I highly recommend reading the paper, the book, or at least my book review.

FlexGen Enables High-Throughput Inference of Large Language Models on Single GPUs

FlexGen is a new generation engine that enables high-throughput inference of large language models on a single commodity GPU. It uses a linear programming optimizer to efficiently store and access tensors and compresses weights and attention cache to 4 bits. FlexGen achieves significantly higher throughput than state-of-the-art offloading systems, reaching a generation throughput of 1 token/s with an effective batch size of 144 on a single 16GB GPU. This means that running LLMs on smaller servers could become viable for more and more companies and individuals.

« Older posts Newer posts »

© 2024 Maximilian Kannen

Theme by Anders NorenUp ↑