The Future is Now

Tag: AGI

Looking Back On 2023 And Predictions for 2024

As we close the chapter on 2023, it’s time to revisit the predictions I laid out at the beginning of the year. It was a year marked by technological strides and societal challenges. Let’s evaluate how my forecasts stood against the unfolding of 2023.

Let’s start with my predictions about AI:

AI will continue to disrupt various industries such as search and creative writing and spark public debate about its impact, even more than is happening right now. It will also lead to the production of high-quality media with fewer people and resources thanks to AI’s assistance. In the field of 3D generation, I expect to see similar progress in 2023, bringing us closer to the quality of 2D generation.

I think I was mostly right. GPT-4 definitely sparked a public debate and we see many industries that became more productive thanks to AI. 3D generation is also already at the level that image generation had at the beginning of the year. What I did not predict was the speed at which companies like Meta or Microsoft would iterate and deploy LLMs in many forms.

My next prediction was about Fusion: “While I expect to see continued progress in this field, it is unlikely that we will see a commercial fusion reactor within the next two years.

Again I was on point but I missed talking about other energy sources like solar which are more relevant. I would count that as a bad focus and not a failed prediction.

I also made predictions for Hardware: “[…] we can expect to see quantum computers with over 1000 Qbits in the upcoming year. GPUs will become more important with the rise of AI. However, these advancements in hardware technology also come with the need for careful consideration and planning in terms of production and distribution. 

We indeed achieved 1000 Qbits even though IBM was not the first company to do so. I also correctly predicted the increased demand for GPUs, but I have to admit I did not expect that scale. I also was more pessimistic about the ability of TSMC and others to meet the demand, and while they drastically outperformed my expectations I was still kind of right because the demand is also way bigger than I anticipated.

My Predictions for VR: “But the year 2023 is shaping up to be a promising one for the VR hardware market, with multiple new headsets, such as the Quest 3, and maybe even an Apple Headset, set to be released. These new products will likely offer improved graphics, more intuitive controls, and a wider range of content and experiences. While it may not fully realize the vision of a “Metaverse”, VR is still likely to be a great entertainment product for many people

And AR: “2023 will be a critical year for AR. It will be the first time that we can build affordable Hardware in a small form factor. Chips like the Snapdragon AR2 Gen 1 implement Wifi 7 and low energy usage and will make it possible to build Smart glasses.

While my VR predictions were all correct, my AR predictions underestimated the difficulty of producing smart glasses in a normal form factor.

I did not make concrete predictions about Brain-computer interfaces, but I honestly expected more progress. More about that in my new predictions later.

Now on to biology and medicine. I made a multiple-year prediction: “If this continues we will be able to beat cancer in the next few years, which leads to the next field.” this cannot be verified yet, but I still believe in it and predicted that a person under 60 could live forever. Recently I looked a lot more into aging research and I still believe that this is correct even though I would change from “every person under 60 has the potential“, to “there is a person under 60 that will“. I think this is an important distinction because stopping aging requires a lot of money and dedication and will not be available for most in the near future.

I ended the post with: “While this was a slow year in some aspects, major progress was made in most fields, and 2023 will be even faster. We are at the knee of an exponential blowup and we are not ready for what is coming. While I am still worried about how society will react and adapt, I am excited for 2023 and the rest of the decade.

Again I believe that I was very much on point with this. Many people were blown away by the rapid developments this year. So let’s talk about the stuff that I did not predict or ignored last year. LK99 is a material that was supposed to be a room-temperature superconductor. At the current time, this was most likely false, but I realized that I did not make a prediction about superconductors in the blog post. I will do this later in this one.

On to the new predictions for 2024. Let’s start with AI again. LLM-based systems will become more autonomous and will reach a point where many will consider them AGI. I personally do not think that we will reach AGI this year, but most likely in 2025. There is also a 70% chance that we will find a new architecture that generalizes better than transformers. No system in 2024 will outperform Humans on the new GAIA benchmark, but they are going to double their performance on it. This will mostly be accomplished by improving reasoning, planning, and tool use with improved fine-tuning and new training strategies.

Results of current Systems on the GAIA benchmark compared to humans

I also predict that commercially viable models will stay under 1 trillion parameters in 2024. There will be a few models over this threshold, but they will not be used in consumer products without paying for them similar to GPT-4 (non-turbo). State space models like RWKV will also become more relevant for specific use cases and most models will at least support image input if not more modalities. RL Models like Alphafold will push scientific discovery even faster in 2024.

Image/video/music/3D generative models will improve dramatically and completely change the art industries. The focus is going to be more on integration and ways to use them and less on pure text2output capabilities. Assistants like Alexa will integrate LMMs and improve drastically. OpenAI will release at least one model that will not be called GPT-5 and wait with GPT-5 until later in the year.

Apple will announce its first LMM at WWDC and at the end of the year we will be able to do most stuff by just talking to our PC. Meta will release Llama-3 which is going to be multimodal and close to GPT-4, and Google will release Gemini at the beginning of the year, which will be comparable to GPT-4 at the beginning and will improve down the year.

Open-source models will stay a few months behind closed-source models, and even further in areas like integration, but offer more customizability. Custom AI hardware like the AI Pin will not become widespread, but smartphones will adapt to AI by including more sensors and I/O options, and towards 2025 we will see smart glasses with AI integration. The sectors that will be influenced the most by AI are education and healthcare, but in the short term, the first industries will be artists and some office workers.

Let’s continue with Hardware. Nvidia will stay the leader in AI hardware with H200 and later this year with B100. Many companies will use their custom chips like Microsoft, Apple, and Google, but the demand will lead to increased sales for every chip company. At the end of 2024, more than half of the global flops will be used for AI. VR Hardware will continue to improve, and we will finally see the first useful everyday AR glasses towards the end of 2024. Quantum computers will become part of some of the cloud providers and will be offered as specialized hardware just like GPUs (Note: This part was written before the AWS Event announcement). They will become more relevant for many industries as the number of Qbits grows. We will also see more variety in chips as they become more specialized to save energy. Brain-computer interfaces will finally be used in humans for actual medical applications.

I did not make any predictions about robots last year, because there weren’t many exciting developments, but that changed. Multiple companies started developing humanoid robots that will be ready in 2024 or 2025. I expect an initial hype around them and adoption in some areas. However, towards the end of the decade they will be replaced with special-purpose robots and humanoid robots will be limited to areas where a human form factor is needed. In general, the amount of Robots will increase in all areas. Progress in planning and advanced AI allows for robots to act in unknown environments and do new tasks. They will leave controlled environments like factories and will appear in, shops, restaurants, streets, and many other places.

The robots: Atlas by Boston Dynamics, Digit by Agility Robotics, and Tesla Optimus by Tesla

Let’s continue with energy. The transition to renewable energy will accelerate in 2024, with a significant focus on solar. The first commercial fusion reactor will begin construction, and nuclear reactors will become even safer, mostly solving the waste problem. More people will build solar for their own houses and become most self-sufficient.

I mentioned LK99 earlier already so here are my predictions for material science. I think that if a room-temperature superconductor is possible, an AI-based system will find it in the next two years. In fact, most new materials will be hypothesized and analyzed by AI and will bring a lot of progress for areas like batteries, solar panels, and other material-dependent fields (Note: this part was written four days before Deepmind presented GNoME).

Biology and medicine are poised to make significant leaps, powered by AI systems like Alphafold and similar technologies. Cancer and other deadly diseases will become increasingly treatable and aging will become a target for many in the field. The public opinion that aging is natural and cannot/should not be stopped will not change this year but maybe in 2025. Prostheses will become more practical and will be connected directly to nerves and bones. This will make them in some areas better than human parts, but touch and precision will continue to be way worse. We will also see progress in artificial organs grown in animals or completely made in a lab.

Transportation in 2024 will change slightly. EVs will become more popular and cheaper but will not reach the level of adaptation that they have in China. Self-driving cars will stay in big cities as taxi replacements and will not be generally available until 2025. Hypertubes will not become a train replacement and will only be built for very specific connections if they get built at all in the next few years.

Other infrastructures like the Internet will continue to stay behind the demand for the next few years. The main driver of the increased need for bandwidth will be high-quality video streaming while the main need for speed will arise from interactive systems like cloud-based AI assistants.

Climate change and unstable governments will lead to an increase in refugees worldwide and social unrest will increase. We will see the first effects of AI-induced Job losses. The political debate will become more heated and some important elections like the US election will be fully determined by large-scale AI-based operations that use Fake news, Deepfakes, and online bots to control the public opinion.

I made a lot more verifiable predictions this time and I hope to see how much I got correct. If I missed any area or technology write them in the comments and I will add a prediction in the comments. Also, let me know your predictions.

Episode 11: GPT-4 Leak, Wolfram Alpha und AGI.

Words of the Future
Words of the future
Episode 11: GPT-4 Leak, Wolfram Alpha und AGI.
Loading
/

In dieser Episode reden Florian und Ich über den GPT-4 Leak, den Nvidia deal mit China, alternative KI Ideen, kleine Roboter und vieles mehr.

Mehr informationen auf dem Discord server
https://discord.gg/3YzyeGJHth
oder auf https://mkannen.tech/

AI helps with AI Understanding

One of the main problems of LLMs is that they are black boxes and how they produce an output is not understandable for humans. Understanding what different neurons are representing and how they influence the model is important to make sure they are reliable and do not contain dangerous trends.

OpenAI applied GPT-4 to find out the different meanings of neurons in GPT-2. The methodology involves using GPT-4 to generate explanations of neuron behavior in GPT-2, simulate what a neuron that fired for the explanation would do, and then compare these simulated activations with the real activations to score the explanation’s accuracy. This process helps in understanding and could potentially help improve the model’s performance.

The tools and datasets used for this process are being open-sourced to encourage further research and development of better explanation generation techniques. This is part of the recent efforts in AI alignment before even more powerful models are trained. Read more about the process here and the paper here. You can also view the neurons of GPT-2 here. I recommend clicking through the network and admiring the artificial brain.

Study Extends BERT’s Context Length to 2 Million Tokens

Researchers have made a breakthrough in the field of artificial intelligence, successfully extending the context length of BERT, a Transformer-based natural language processing model, to two million tokens. The team achieved this feat by incorporating a recurrent memory into BERT using the Recurrent Memory Transformer (RMT) architecture.

The researchers’ method increases the model’s effective context length and maintains high memory retrieval accuracy. This allows the model to store and process both local and global information, improving the flow of information between different segments of an input sequence.

The study’s experiments demonstrated the effectiveness of the RMT-augmented BERT model, which can now tackle tasks on sequences up to seven times its originally designed input length (512 tokens). This breakthrough has the potential to significantly enhance long-term dependency handling in natural language understanding and generation tasks, as well as enable large-scale context processing for memory-intensive applications.

Google and DeepMind Team Up

Google and DeepMind just announced that they will unite Google Brain and Deepmind into Google DeepMind. This is a good step for both sites since Deepmind really needs the computing power of Google to make further progress on AGI and Google needs the Manpower and knowledge of the Deepmind team to quickly catch up to OpenAi and Microsoft. This partnership could lead to a real rival on the way to AGI for OpenAI. I personally always liked that DeepMind had a different approach to AGI and I hope they will continue to push different ideas other than language models.

Stanford and Google let AI roleplay

In a new research paper, Google and Stanford University created a sandbox world where they let 25 AI agents role-play. The agents are based on chatGPT-3.5 and behave more believably than real humans. Future agents based on GPT-4 will be able to act even more realistically and intelligently. This could not only mean that we get better AI NPCs in computer games, but it also means that we will not be able to distinguish bots from real people. This is a great danger in a world where public opinions influence many. As these agents become more human-like, the risk of deep emotional connections increases, especially if the person does not know that they are interacting with an AI.

The New Wave of GPT Agents

Since GPT-3.5 and GPT-4 APIs are available many companies and start-ups have implemented them into their products. Now developers have started to do it the other way around. They build systems around GPT-4 to enable it to search, use APIs, execute code, and interact with itself. Examples are HuggingGPT or AutoGPT. They are based on works like Toolformer or this result. Even Microsoft itself started to build LLM-Augmenter around GPT-4 to improve its performance.

I talked about this development in my post on how to get from GPT-4 to proto-AGI. I still think that this is the way to a general assistant even though I am not sure if GPT-4 is already capable enough or if we need another small improvement.

Open Letter to pause bigger AI models

A group of researchers and notable people released an open letter in which they call for a 6 month stop from developing models that are more advanced than GPT-4. Some of the notable names are researchers from competing companies like Deepmind, Google, and Stability AI like Victoria Krakovna, Noam Shazeer, and Emad Mostaque. But also some professors and authors like Stuart Russell or Peter Warren. The main concern is the lack of control and understanding of these systems and the potential risks that go from misinformation to human extinction.

Alles Denkbare wird einmal gedacht. Jetzt oder in der Zukunft. Was Solomo gefunden hat, kann einmal auch ein anderer finden, […]. / Everything that is conceivable will be thought of at some point. Whether now or in the future. What Solomon has found, another may also find someday […].

Dürrenmatt, Die Physiker

Although I recognize some valid concerns in the letter, I personally disagree with them. As demonstrated in Dürrenmatt’s novel “The Physicists,” technology, no matter how dangerous, cannot be hindered or halted and will always advance. Even if OpenAI were to stop developing GPT-5, other nations would continue to do so, akin to nuclear weapons, which do not provide any benefits. However, AI possesses enormous potential for good, making it difficult to argue against its development. While there is a possibility of AI causing harm, preventing or slowing its progress would prevent billions of people from being aided by its potential benefits. I believe that the risk of a negative outcome is acceptable if it allows us to solve most of our issues. Especially since it looks like right now that a negative outcome is guaranteed without AI, as the climate crises and global conflicts arise.

Listen to OpenAI

Many people saw the new episode of the Lex Friedman Podcast with Sam Altman, where he talks about some social and political implications of GPT-4.

But fewer people saw the podcast with Ilya Sutskever, the Chief Scientist at OpenAI, which is way more technical and in my opinion even more exciting and enjoyable. I really recommend listening to the talk which is only 45 minutes long.

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Microsoft researchers have conducted an investigation on an early version of OpenAI’s GPT-4, and they have found that it exhibits more general intelligence than previous AI models. The model can solve novel and difficult tasks spanning mathematics, coding, vision, medicine, law, psychology, and more, without needing any special prompting. Furthermore, in all of these tasks, GPT-4‘s performance is strikingly close to human-level performance and often vastly surpasses prior models. The researchers believe that GPT-4 could be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. This is in line with my own experience and shows that we are closer to AGI than we thought.

The study emphasizes the need to discover the limitations of such models and the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. The study concludes with reflections on the societal implications of the recent technological leap and future research directions.

Learning to Grow Pretrained Models for Efficient Transformer Training

A new research paper proposes a method to accelerate the training of large-scale transformers, called the Linear Growth Operator (LiGO). By utilizing the parameters of smaller, pre-trained models to initialize larger models, LiGO can save up to 50% of the computational cost of training from scratch while achieving better performance. This approach could have important implications for the field of AGI by enabling more efficient and effective training methods for large-scale models, and potentially leading to more flexible and adaptable models that can learn to grow and evolve over time. If this is already used to train GPT-5 it could mean that we get GPT-5 earlier than expected.

ChatGPT’s biggest update jet

OpenAI announced that they will introduce plugins to ChatGPT. Two of them developed by OpenAi themself allow the model to search the web for information and run generated python code. Other third-party plugins like Wolfram allow the model to use other APIs to perform certain tasks. the future capabilities of a model enhanced this way are limitless. I talked about this development in my Post “From GPT-4 to Proto-AGI” where I predicted this development. If the capability to run generated code is not too limited, I would call this Proto-AGI.

From GPT-4 to Proto-AGI

Deutsche Version

Artificial General Intelligence (AGI) is the ultimate goal of many AI researchers and enthusiasts. It refers to the ability of a machine to perform any intellectual task that a human can do, such as reasoning, learning, creativity, and generalization. However, we are still far from achieving AGI with our current AI systems. One of the most advanced AI systems today is GPT-4, a large multimodal model created by OpenAI that can take text and pictures as input and outputs text. So how far away from AGI is GPT-4 and what do we need to do to get there?

What GPT-4 is capable of?

GPT-4 is a successor of GPT-3.5, which was already impressive in its ability to generate coherent and fluent text on various topics and domains. GPT-4 improves on GPT-3.5 by being more reliable, creative, and able to handle much more nuanced instructions than its predecessor. For example, it can pass a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. It also generates medium-sized working programs and can reason to a certain extent. The context window of GPT-4 is 32K tokens which allows it to produce entire programs.

Comparison between 3.5 and 4 on different Exams. Taken from the GPT-4 paper

GPT-4 also adds a new feature: visual input. It can accept image and text inputs together and emit text outputs that are relevant to both modalities. For instance, it can describe what is happening in an image or understand its relevance in a given context. This makes GPT-4 more versatile and useful for various applications that require multimodal understanding.

However, despite its impressive capabilities, GPT-4 is still far from being able to perform all the tasks that humans can do with language and images. It still lacks some crucial components that are necessary for achieving AGI.

What do we need to add?

One of the main limitations of GPT-4 is that it has no memory. It cannot remember what it has said, outside of its context window, or learned before, and cannot use it for future reference or inference. This means that it cannot build long-term knowledge or relationships with its users or other agents. It also means that it cannot handle complex reasoning tasks that require multiple steps or facts that exceed its context window.

Another limitation of GPT-4 is that it has no access to tools that can help it solve problems or learn new skills. For example, it cannot use the Internet to search for information on the web; Wolfram Alpha to compute mathematical expressions; databases to store and retrieve data; or other APIs to interact with external services. This limits its ability to acquire new knowledge or perform tasks beyond outputting text.

A third limitation of GPT-4 is that it has no inner thinking. It is strictly an input-output machine that produces exactly one piece of text for every input it gets. In between inputs it does nothing and is in the same state every time. The ability to simulate possible situations is called mental simulation and is one of the key abilities of the human brain. It is a fundamental form of computation in the brain, underlying many cognitive skills such as mindreading, perception, memory, and language. The fact that all Transformer based AI systems are not capable of that in their current form, is, in my opinion, the main reason why AGI is still not in sight.

How do we do this?

To overcome these limitations and move closer towards AGI, we need to add some features and functionalities to GPT-4 that can substitute for these shortcomings.

One possible way to do this is by using chain prompts. Chain prompts are sequences of inputs and outputs that guide the model through a series of steps or actions towards a desired goal. For example, we can use chain prompts to instruct GPT-4 to search for information on the Internet. By using chain prompts, we can extend GPT-4‘s capabilities and make it more powerful and transparent. Instead of giving the Model the input directly, we would ask it which parts of the input it needs more information on, and then we get a list of keywords selected by the model that we feed into a search engine. In the last step, we add the information that we got to the original input and give the user the final output.

Another possible way to do this is by using Toolformer. Toolformer was proposed by Meta that allows us to integrate external tools into LLMs by using special tokens that represent tool names. The model would be fine-tuned on text examples of API calls. For example, we can use Toolformer to write:
Input: What is 2 + 2?
Output:The answer is <calculator args=”2+2″>4</calculator>.
This way, GPT-4 can learn to use tools by observing how they are used in natural language contexts. Toolformer can also handle complex tool compositions and nested tool calls. Some tools that would drastically enhance the capabilities of GPT are

Wolfram Alpha (Math)

A calendar (temporal awareness),

A search engine (information gathering)

A database(memory)

A command line (general control)

Especially the last part is really special. By giving a powerful enough model access to a computer, and combining this with other methods such as chain prompting, we could enable unlimited possibilities.
One special case of these techniques that I want to highlight is code execution. An LLM that can run generated code itself and receive the output could build the programs to solve every task it gets. This starts with writing simple functions to solve equations to controlling a smart home or fine-tuning itself.

We can also add memory this way by giving it access to a database. We could use chain prompting to ask the model if parts of the input or output should be saved for the future and combine it with a writing call to the database. We then could use embeddings to search the database for every input and extract relevant information. Embeddings are vector representations of text that decode the meaning of the text. Asking the model about an appointment with your doctor would be represented by a vector that is similar to the vector that represents the information about the appointment in the database. The solution is not perfect but would add memory to the model.

Embeddings as memory. Image from https://medium.com/@jeremyarancio/create-your-document-chatbot-with-gpt-3-and-langchain-8eeb66b98656

Where we are right now

We already see the start of these augmentations. The first one was BingGPT which augments GPT-4 with a search engine. The most recent and impressive one is Microsoft’s copilot for Microsoft 365, which combines GPT-4 with all the Office tools and their Microsoft Graph system, which also gives it access to all your documents. Other companies will follow even though the integration is limited since the model is not Open source and OpenAI are the only ones able to fine-tune it. But for most of these techniques, you can use Langchain which is a new code library that contains many of the described ways to improve GPT.4

What we could see until the end of the year

All these methods are not mutually exclusive and can be combined in different ways depending on the task and context. Many companies are already or going to integrate GPT-4 into their products. And the more tools can be controlled by natural language the easier it will be for other LLMs to use them. Until the end of the year, we will see Language Models talking to each other. I can see a near future where we have our own custom model that talks to BingGPT, Copilot, or other software and takes on the role of a dirigent of other instances of GPT-4. But there are also risks. Giving the model too much control could lead to chains of mistakes if the model is not powerful enough and makes mistakes or it could lead to a complete takeover and fast takeoff if future models like GPT-5 or 6 are too powerful. This is unlikely as long as OpenAI holds tight control over the development and execution of these models, but the competition is growing and broadly available Hardware and software are becoming better and better. This year will be the rise of AI and next year could be the birth year of proto-AGI.

Update: shortly after I finished this post, this paper was released. It talks about a form of memorizing transformer, which I found to be quite relevant to this post.

German version below

Von GPT-4 zu Proto-AGI

Artificial General Intelligence (AGI) ist das ultimative Ziel vieler AI-Forscher und Enthusiasten. Es bezieht sich auf die Fähigkeit einer Maschine, jede geistige Aufgabe auszuführen, die ein Mensch tun kann, wie etwa das Denken, Lernen, Kreativität und Generalisierung. Allerdings sind wir noch weit davon entfernt, AGI mit unseren derzeitigen AI-Systemen zu erreichen. Eines der fortschrittlichsten AI-Systeme aktuell ist GPT-4, ein großes multimodales Modell, dass von OpenAI erstellt wurde und Text und Bilder als Eingabe nimmt und Text als Ausgabe produziert. Also wie weit ist GPT-4 von AGI entfernt und was müssen wir tun, um dorthin zu gelangen?

Was kann GPT-4?

GPT-4 ist der Nachfolger von GPT-3.5, dass bereits beeindruckend ist in seiner Fähigkeit, zusammenhängenden und flüssigen Text zu verschiedenen Themen und Domänen zu generieren. GPT-4 verbessert GPT-3.5, indem es zuverlässiger, kreativer und in der Lage ist, viel nuanciertere Anweisungen als sein Vorgänger zu handhaben. Zum Beispiel kann es eine simulierte Bar-Prüfung mit einer Punktzahl um die Top 10% der Testteilnehmer bestehen; im Gegensatz dazu lag die Punktzahl von GPT-3.5 bei rund 10% am unteren Ende. Es generiert auch mittelgroße funktionierende Programme und kann bis zu einem gewissen Grad schlussfolgern. Das Kontextfenster von GPT-4 umfasst 32 tausend Token, was es ermöglicht, ganze Programme zu erstellen.

Vergleich zwischen 3,5 und 4 in verschiedenen Tests. Entnommen von dem GPT-4 paper.

GPT-4 fügt auch eine neue Funktion hinzu: visuelle Eingabe. Es kann sowohl Bild- als auch Texteingaben akzeptieren und Textausgaben liefern, die für beide Modalitäten relevant sind. Zum Beispiel kann es beschreiben, was in einem Bild passiert, oder den inhalt eines Bildes in einen Kontext einzuordnen. Dies macht GPT-4 vielseitiger und nützlicher für verschiedene Anwendungen, die ein multimodales Verständnis erfordern.

Was noch fehlt?

Trotz seiner beeindruckenden Fähigkeiten ist GPT-4 jedoch noch weit davon entfernt, alle Aufgaben ausführen zu können, die Menschen mit Sprache und Bildern bewältigen können. Es fehlen noch einige wesentliche Komponenten, die für die Erreichung von AGI notwendig sind.

Eine der Hauptbeschränkungen von GPT-4 ist, dass es kein Gedächtnis hat. Es kann sich nicht daran erinnern, was es gesagt hat, außerhalb seines Kontextfensters oder was es zuvor gelernt hat, und kann es nicht für zukünftige Referenzen oder Rückschlüsse verwenden. Dies bedeutet, dass es kein langfristiges Wissen oder Beziehungen zu seinen Benutzern oder anderen Agenten aufbauen kann. Es bedeutet auch, dass es keine komplexen Denkaufgaben bewältigen kann, die mehrere Schritte erfordern oder Fakten überschreiten, die sein Kontextfenster übersteigen. Eine weitere Einschränkung von GPT-4 ist, dass es keinen Zugang zu Tools hat, die ihm helfen können, Probleme zu lösen oder neue Fähigkeiten zu erlernen. Es kann z.B. nicht das Internet nutzen, um nach Informationen im Web zu suchen; Wolfram Alpha zur Berechnung mathematischer Ausdrücke; Datenbanken zur Speicherung und Abfrage von Daten oder andere APIs zur Interaktion mit externen Diensten. Dies begrenzt seine Fähigkeit, neues Wissen zu erwerben oder Aufgaben jenseits des Textausgabe zu erledigen. Eine dritte Einschränkung von GPT-4 ist, dass es kein inneres Denken hat. Es ist streng genommen eine Input-Output-Maschine, die für jede Eingabe genau ein Textstück produziert. Zwischen den Eingaben tut es nichts und ist jedes Mal im gleichen Zustand. Die Fähigkeit, mögliche Situationen zu simulieren, wird als mentale Simulation bezeichnet und ist eine der Schlüsselkompetenzen des menschlichen Gehirns. Sie ist eine grundlegende Form der Berechnung im Gehirn und liegt vielen kognitiven Fähigkeiten wie Gedankenlesen, Wahrnehmung, Gedächtnis und Sprache zugrunde. Die Tatsache, dass alle auf der Transformer-Technologie basierenden KI-Systeme in ihrer derzeitigen Form dazu nicht in der Lage sind, ist meiner Meinung nach der Hauptgrund, warum AGI noch nicht in Sicht ist.

Wie können wir das erreichen?

Um diese Einschränkungen zu überwinden und uns der AGI näher zu bringen, müssen wir GPT-4 einige Funktionen und Eigenschaften hinzufügen, die diese Mängel ausgleichen können. Eine mögliche Methode dafür sind sogenannte “Chain Prompts“. Chain Prompts sind Folgen von Eingaben und Ausgaben, die das Modell durch eine Reihe von Schritten oder Aktionen hin zu einem gewünschten Ziel führen. Zum Beispiel können wir Chain Prompts verwenden, um GPT-4 anzuweisen, im Internet nach Informationen zu suchen. Mit Chain Prompts können wir die Fähigkeiten von GPT-4 erweitern und es leistungsfähiger und transparenter machen. Anstatt dem Modell die Eingabe direkt zu geben, würden wir es fragen, welche Teile der Eingabe mehr Informationen benötigen, dann bekommen wir eine Liste von Schlüsselwörtern, die vom Modell ausgewählt wurden und die wir in eine Suchmaschine eingeben. Im letzten Schritt fügen wir die erhaltenen Informationen der ursprünglichen Eingabe hinzu und geben dem Benutzer die endgültige Ausgabe.

Eine weitere mögliche Methode hierfür ist die Verwendung von Toolformer. Toolformer wurde von Meta entwickelt und ermöglicht uns, externe Tools in LLMs zu integrieren, indem wir spezielle Tokens verwenden, die Toolnamen darstellen. Das Modell würde mit Textbeispielen von API-Aufrufen verfeinert werden. Zum Beispiel können wir Toolformer verwenden, um Folgendes zu schreiben:

Eingabe: What is 2 + 2?
Ausgabe: The answer is <calculator args=”2+2″>4</calculator>.

Auf diese Weise kann GPT-4 lernen, Tools zu verwenden, indem es beobachtet, wie sie in natürlichen Sprachkontexten verwendet werden. Toolformer kann auch komplexe Toolzusammensetzungen und verschachtelte Toolaufrufe verarbeiten. Einige Tools, die die Fähigkeiten von GPT drastisch verbessern würden, sind

Wolfram Alpha (Mathematik)

Kalender (zeitliche Kenntnisse)

Suchmaschine (Informationsbeschaffung)

Datenbank (Speicher)

Commandozeile (generelle Kontrolle).

Besonders der letzte Punkt ist sehr wichtig. Indem wir einem ausreichend mächtigen Modell Zugang zu einem Computer geben und dies mit anderen Methoden wie Chain Prompting kombinieren, könnten wir unbegrenzte Möglichkeiten eröffnen. Ein spezieller Fall dieser Techniken, den ich hervorheben möchte, ist die Ausführung von Code. Ein Sprachmodel, das generierten Code selbst ausführen und die Ausgabe empfangen kann, könnte Programme zum Lösen jeder Aufgabe erstellen. Dies beginnt mit dem Schreiben einfacher Funktionen zur Lösung von Gleichungen bis hin zur Steuerung eines Smart Homes oder der eigenen Verbesserung.

Auf diese Weise können wir dem Modell auch Zugriff auf eine Datenbank geben, um so den Speicher zu erweitern. Wir könnten Chain Prompting nutzen, um das Modell zu fragen, ob Teile der Eingabe oder Ausgabe für die Zukunft gespeichert werden sollen, und es mit einem Schreibbefehl an die Datenbank kombinieren. Anschließend könnten wir Embeddings verwenden, um die Datenbank nach jeder Eingabe zu durchsuchen und relevante Informationen zu extrahieren. Embeddings sind Vektor-Textdarstellungen, die die Bedeutung des Textes entschlüsseln. Wenn wir das Modell beispielsweise nach einem Termin mit unserem Arzt fragen, wird ein Vektor erstellt, der ähnlich dem Vektor ist, mit dem die Informationen über den Termin in der Datenbank dargestellt werden. Die Lösung ist zwar nicht perfekt, würde aber dem Modell Gedächtnis hinzufügen.

Embeddings als Gedächnis des Models. Bild von https://medium.com/@jeremyarancio/create-your-document-chatbot-with-gpt-3-and-langchain-8eeb66b98656

Der aktuelle Stand

Wir sehen bereits den Beginn dieser Erweiterungen. Die erste war BingGPT, die GPT-4 mit einer Suchmaschine erweitert. Die neueste und beeindruckendste ist Microsofts Copilot für Microsoft 365, eine Kombination aus GPT-4 und allen Office-Tools sowie ihrem Microsoft Graph-System, das auch Zugriff auf alle deine Dokumente gibt. Andere Unternehmen werden folgen, obwohl die Integration begrenzt ist, da das Modell nicht Open Source ist und nur OpenAI es feinabstimmen kann. Besonders hervorheben möchte ich Langchain eine code bibliothek die viele der hier beschrieben Techniken seh vereinfacht

Was noch diesen Jahr passieren kann

All diese Methoden schließen einander nicht aus und können je nach Aufgabe und Kontext auf unterschiedliche Weise kombiniert werden. Viele Unternehmen integrieren bereits oder werden GPT-4 in ihre Produkte integrieren. Und je mehr Werkzeuge von natürlicher Sprache gesteuert werden können, desto einfacher wird es für andere LLMs sein, sie zu nutzen. Bis Ende des Jahres werden wir sehen, wie Sprachmodelle miteinander sprechen. Ich kann mir eine nahe Zukunft vorstellen, in der wir unser eigenes benutzerdefiniertes Modell haben, das mit BingGPT, Copilot oder anderen Software spricht und die Rolle eines Dirigenten für andere Instanzen von GPT-4 übernimmt. Aber es gibt auch Risiken. Wenn das Modell zu viel Kontrolle erhält und nicht leistungsstark genug ist wird es Fehler machem, welche zu Ketten von Fehlern führen können, oder anders herum könnte es zu einem vollständigen Kontrollverlust der Menschen und einer explosionsartigen Entwickung von künstlicher Intelligenz kommen, wenn zukünftige Modelle wie GPT-5 oder 6 zu leistungsfähig sind. Dies ist unwahrscheinlich, solange OpenAI eine strenge Kontrolle über die Entwicklung und Ausführung dieser Modelle ausübt, aber der Wettbewerb wächst und die allgemein verfügbare Hardware und Software werden immer besser. Dieses Jahr wird das Aufkommen von KI sein und nächstes Jahr könnte das Geburtsjahr von Proto-AGI sein.

Da ich nach einer deutschen Version der Posts gefragt wurde ist dies mein erster Versuch Posts zweisprachig zu machen. Ich freue mich über Feedback und kann auf Wunsch auch gerne noch einzelne ältere Posts übersetzen. (Die Übersetzung ist von GPT-3.5 und enhält sprachliche Fehler und suboptimale Formulierungen.)

Update: kurz nachdem ich diesen Aritkel fertig hatte, wurde dieses Paper veröffentlicht. Es geht um eine Form von Transformer mit Gedächnis, was sehr relevant für diesen Artikel ist.

FlexGen Enables High-Throughput Inference of Large Language Models on Single GPUs

FlexGen is a new generation engine that enables high-throughput inference of large language models on a single commodity GPU. It uses a linear programming optimizer to efficiently store and access tensors and compresses weights and attention cache to 4 bits. FlexGen achieves significantly higher throughput than state-of-the-art offloading systems, reaching a generation throughput of 1 token/s with an effective batch size of 144 on a single 16GB GPU. This means that running LLMs on smaller servers could become viable for more and more companies and individuals.

New Transformer Model CoLT5 Processes Long Documents Faster and More Efficiently than Previous Models

Researchers from several institutions, including the University of California, Berkeley, and Facebook AI Research, have developed a new transformer model that can process long documents faster and more efficiently than previous models. The team’s paper, titled “CoLT5: Faster Long-Range Transformers with Conditional Computation,” describes a transformer model that uses conditional computation to devote more resources to important tokens in both feedforward and attention layers.

CoLT5’s ability to effectively process long documents is particularly noteworthy, as previous transformer models struggled with the quadratic attention complexity and the need to apply feedforward and projection layers to every token. The researchers show that CoLT5 outperforms LongT5, the previous state-of-the-art long-input transformer model, on the SCROLLS benchmark, while also boasting much faster training and inference times.

Furthermore, the team demonstrated that CoLT5 can handle inputs up to 64k in length with strong gains. These results suggest that CoLT5 has the potential to improve the efficiency and effectiveness of many natural language processing tasks that rely on long inputs.

MathPrompter: Mathematical Reasoning using Large Language Models

Microsoft published a new paper in which they present the language model MathPrompter which uses the Zero-shot chain-of-thought prompting technique to generate multiple Algebraic expressions or Python functions to solve the same math problem in different ways and thereby raise the confidence level in the output results. This led to a score of 92.5 on the MultiArith dataset which is beating current sota results by far.

LLMs that use APIs like Toolformer or run their own generated code are a recent development that gives promising results and enables many new capabilities.

GPT-4 Next Week

In a small german information event today, four Microsoft employees talked about the potential of LLMs and mentioned that they are going to release GPT-4 next week. They implied that GPT-4 will be able to work with video data, which implies a multimodal model comparable to PaLM-E. Read more here.

OpenAI addressed Alignment and AGI concerns

OpenAi released a blog post about their plans for AGI and how to minimize the negative impacts. I highly recommend reading it yourself, but the key takeaways are:

  1. The mission is to ensure that AGI benefits humanity by increasing abundance, turbocharging the global economy, and aiding in the discovery of new scientific knowledge.
  2. AGI has the potential to empower humanity with incredible new capabilities, but it also comes with serious risks of misuse, drastic accidents, and societal disruption.
  3. To prepare for AGI, a gradual transition to a world with AGI is better than a sudden one. The deployment of AGI should involve a tight feedback loop of rapid learning and careful iteration, and democratized access will lead to more and better research, decentralized power, and more benefits. Developing increasingly aligned and steerable models, empowering individuals to make their own decisions, and engaging in a global conversation about key issues are also important.

© 2024 Maximilian Kannen

Theme by Anders NorenUp ↑