The Future is Now

Category: Technological singularity

Why Open Source Models Are Great

The open-source AI landscape has witnessed significant growth and development in recent years, with numerous projects and initiatives emerging to democratize access to artificial intelligence. In this blog post, I will go into the current state of open-source AI, exploring the key players, fine-tuning techniques, hardware and API providers, and the compelling arguments in favor of open-source AI.

Model Providers

Training LLMs costs a significant amount of money and requires a lot of experience and hardware. Only a few organizations have the means to do so. The following list is not complete and just covers some of the big ones.

Meta is currently the biggest company that open source models. Their model family is called Llama and the current Llama 3 models are available in Two sizes: 8B and 70B. A 405B model is expected soon. The weak points of the current versions are their lack of non-English training data and their small context size. Meta is already working on that.

Mistral is a smaller French company that got investments from Microsoft including computing power. While not all their models are open-source, the ones that are, perform well. they open-sourced a 7B model that was a cornerstone of open source models for quite some time and they open-sourced two Mixture-of-Expert Models (8x7B, and 8x22B) that are still leading non-English open source models, especially at their price point.

Cohere recently open-sourced a few models including their LLMs Command-R and Command-R+. They perform especially well when used in combination with retrivel augmented generation.

Stability Ai is mostly known for open-sourcing text2image models, but they also open-sourced a few smaller LLMs that are decent for their size.

Google does not open source their Gemini models, but they have a set of open models called Gemma which include some experimental LLms that are not based on Transformers.

API-Providers and Hardware

The main argument for open source models is the ability to run them on your own on your personal machine. Current models range from 2B to over 100B parameters. So let’s see what is needed to run them
For small models under 7B, you don’t need anything special. These models could even run on your phone. Models between 7B and 14B models can be run on most PCs but can be very slow unless you have a modern GPU. Bigger models between 14B and 70B require extremely high-end PCs. Apple’s modern high-end devices are especially great since they offer shared memory that is needed for bigger models. Everything over 70B, including the MoE models from Mistral usually are not usable for Home devices. They instead are available on a broad selection of API providers who host different open-source models and compete on price, speed, and latency. I selected a few that excel in one or two of these categories.

Groq is a newer hardware company that developed custom chips for LLMs. That allows them to offer incredible speeds and prices. For example Llama 3 8B for less than 10 cents for a million tokens and over 800 tokens per second. If you run the model yourself you would get around 10-20 tokens per second depending on your hardware.

Together.ai offers nearly all common open-source models and gives you a few million tokens for free at the start to start experimenting immediately.

Perplexity is not only a great search engine, but its API is also great. Not as cheap or fast as Groq, but extremely low latency and they offer their own models with internet access. They also provide free API credits for perplexity pro users.

If you prefer to run them on your own I recommend a newer Nvidia GPU with as much VRAM as you can afford.

Customization

One of the great side effects of having control over the model is the ability to change it to your needs. This starts with simple things like system prompts or temperature. Another thing that is often used is quantization. Quantization describes the process of taking the parameters of the models that are usually saved as floating point numbers with 16 or 32 bits of precision and rounding them in different ways to shrink them to somewhere between 8 and 1 bit. This process reduces the capabilities of the models slightly depending on the factor but makes that model easier and faster to run on weaker hardware.

Fine-tuning

For many use cases, current models are not optimal. They lack knowledge perform worse in a required language or simply do not perform well in a certain task. To solve these problems you can fine-tune the models. Fine-tuning means continuing the training of the model on a small custom data set that helps the model learn the required ability. The following part will be a bit more technical and can be skipped:
3 main types of open-source LLMs are available: Base models, Instruct models, and chat models. Base models are only trained on huge amounts of text and work more like text completion. They do not really work as chatbots and are hard to use. Instruct models are already fine-tuned by the creator on a set of text examples that teach to model to follow the instructions of a given input instead of simply continuing the text. Chat models are further fine-tuned to behave in a chatbot-like way and can hold conversations. They are also often trained to have certain limitations and can refuse to talk about certain things if they are trained to do. For fine-tuning, base models give the most freedom. You could even continue the training with new languages or information and do instruct training after that. There are already instruct datasets available that can be used or you can create your own. If you fine-tune existing instruct models, you usually need fewer data and compute and you can still teach the Model a lot and change its behavior. This is most often the best choice. Existing chat models can still be fine-tuned but since they are already trained in a certain way it is harder to get specific behaviour and teaching it completely new skills is hardly possible. Fine-tuning chat models is best if you just want to change the tone of the model or train it on a specific writing style. There are different ways to fine-tune: Most often you fix the earlier layers of the model so the learned knowledge of the model will not be changed too much and only train the later layers. While this is not totally correct, I like to imagine that later layers are more important for the style of output while earlier layers work more like the core language understanding part of the model. So the more fundamental the thing is you want to change the more layers you need to train. things like a certain writing style usually only require the very end of the model, while things like improved math capabilities need most of the network. There is another way to fine-tune models that often pops up: LORAs. LORA stands for Low-Rank Adaption. It uses the fact that LLM layer matrices have a lower rank ( lower dimension) to split them up into two matrices which contain fewer parameters in sum than the original matrix. The fine-tuning is then happening on the two new matrices which make the process faster and cheaper and allow LORAs to be shared with less memory overhead. The LORA matrices can then later be swapped in and out like a hat.

output control

If you have control over your model, you can also inject things into its output. The most popular example is something like JSON mode, where at every token instead of selecting randomly from the logits, an external program checks which output token is valid given the JSON grammar and can select the one. This can be used to guarantee that the output follows a certain given structure and can also be used for things like tool use or other additional functions.

Local tools

There is a range of tools to run models locally from chat interfaces that mimic the experience of chatGPT to local API servers that can be used for companies or developers. Here are some examples

GPT4All is a local chat interface that not only allows you to download models but can also give the models access to your local documents and is very easy to use.

Ollama is a local LLM server which makes it easy to install additional models and supports a wide range of Operating systems and Hardware.

LM Studio also offers a user interface to chat with models but also includes functionality to fine-tune them with LORA

Conclusion

So as you can see there are many reasons why open-source models can be superior even though the biggest and smartest models that are currently available are slightly better than the best open-source models. They are way cheaper, even if you compare price per performance and they allow for much more custom control. They can be trained to your liking and needs, and offer privacy and control over your data and use. If you run them locally they often have lower latency and even if you use API providers you will get better prices and super-fast interference. Open-source models used to be around a year behind some of the top models, but in recent times, they started to catch up. They will probably never lead the field in terms of capabilities but they will always be the cheaper option. ChatGPT3.5 is the best example of a model that got beaten by open source a long time ago. Models like Llama 3 are not only cheaper, but they are also way faster and offer all the advantages of open models.


Looking Back On 2023 And Predictions for 2024

As we close the chapter on 2023, it’s time to revisit the predictions I laid out at the beginning of the year. It was a year marked by technological strides and societal challenges. Let’s evaluate how my forecasts stood against the unfolding of 2023.

Let’s start with my predictions about AI:

AI will continue to disrupt various industries such as search and creative writing and spark public debate about its impact, even more than is happening right now. It will also lead to the production of high-quality media with fewer people and resources thanks to AI’s assistance. In the field of 3D generation, I expect to see similar progress in 2023, bringing us closer to the quality of 2D generation.

I think I was mostly right. GPT-4 definitely sparked a public debate and we see many industries that became more productive thanks to AI. 3D generation is also already at the level that image generation had at the beginning of the year. What I did not predict was the speed at which companies like Meta or Microsoft would iterate and deploy LLMs in many forms.

My next prediction was about Fusion: “While I expect to see continued progress in this field, it is unlikely that we will see a commercial fusion reactor within the next two years.

Again I was on point but I missed talking about other energy sources like solar which are more relevant. I would count that as a bad focus and not a failed prediction.

I also made predictions for Hardware: “[…] we can expect to see quantum computers with over 1000 Qbits in the upcoming year. GPUs will become more important with the rise of AI. However, these advancements in hardware technology also come with the need for careful consideration and planning in terms of production and distribution. 

We indeed achieved 1000 Qbits even though IBM was not the first company to do so. I also correctly predicted the increased demand for GPUs, but I have to admit I did not expect that scale. I also was more pessimistic about the ability of TSMC and others to meet the demand, and while they drastically outperformed my expectations I was still kind of right because the demand is also way bigger than I anticipated.

My Predictions for VR: “But the year 2023 is shaping up to be a promising one for the VR hardware market, with multiple new headsets, such as the Quest 3, and maybe even an Apple Headset, set to be released. These new products will likely offer improved graphics, more intuitive controls, and a wider range of content and experiences. While it may not fully realize the vision of a “Metaverse”, VR is still likely to be a great entertainment product for many people

And AR: “2023 will be a critical year for AR. It will be the first time that we can build affordable Hardware in a small form factor. Chips like the Snapdragon AR2 Gen 1 implement Wifi 7 and low energy usage and will make it possible to build Smart glasses.

While my VR predictions were all correct, my AR predictions underestimated the difficulty of producing smart glasses in a normal form factor.

I did not make concrete predictions about Brain-computer interfaces, but I honestly expected more progress. More about that in my new predictions later.

Now on to biology and medicine. I made a multiple-year prediction: “If this continues we will be able to beat cancer in the next few years, which leads to the next field.” this cannot be verified yet, but I still believe in it and predicted that a person under 60 could live forever. Recently I looked a lot more into aging research and I still believe that this is correct even though I would change from “every person under 60 has the potential“, to “there is a person under 60 that will“. I think this is an important distinction because stopping aging requires a lot of money and dedication and will not be available for most in the near future.

I ended the post with: “While this was a slow year in some aspects, major progress was made in most fields, and 2023 will be even faster. We are at the knee of an exponential blowup and we are not ready for what is coming. While I am still worried about how society will react and adapt, I am excited for 2023 and the rest of the decade.

Again I believe that I was very much on point with this. Many people were blown away by the rapid developments this year. So let’s talk about the stuff that I did not predict or ignored last year. LK99 is a material that was supposed to be a room-temperature superconductor. At the current time, this was most likely false, but I realized that I did not make a prediction about superconductors in the blog post. I will do this later in this one.

On to the new predictions for 2024. Let’s start with AI again. LLM-based systems will become more autonomous and will reach a point where many will consider them AGI. I personally do not think that we will reach AGI this year, but most likely in 2025. There is also a 70% chance that we will find a new architecture that generalizes better than transformers. No system in 2024 will outperform Humans on the new GAIA benchmark, but they are going to double their performance on it. This will mostly be accomplished by improving reasoning, planning, and tool use with improved fine-tuning and new training strategies.

Results of current Systems on the GAIA benchmark compared to humans

I also predict that commercially viable models will stay under 1 trillion parameters in 2024. There will be a few models over this threshold, but they will not be used in consumer products without paying for them similar to GPT-4 (non-turbo). State space models like RWKV will also become more relevant for specific use cases and most models will at least support image input if not more modalities. RL Models like Alphafold will push scientific discovery even faster in 2024.

Image/video/music/3D generative models will improve dramatically and completely change the art industries. The focus is going to be more on integration and ways to use them and less on pure text2output capabilities. Assistants like Alexa will integrate LMMs and improve drastically. OpenAI will release at least one model that will not be called GPT-5 and wait with GPT-5 until later in the year.

Apple will announce its first LMM at WWDC and at the end of the year we will be able to do most stuff by just talking to our PC. Meta will release Llama-3 which is going to be multimodal and close to GPT-4, and Google will release Gemini at the beginning of the year, which will be comparable to GPT-4 at the beginning and will improve down the year.

Open-source models will stay a few months behind closed-source models, and even further in areas like integration, but offer more customizability. Custom AI hardware like the AI Pin will not become widespread, but smartphones will adapt to AI by including more sensors and I/O options, and towards 2025 we will see smart glasses with AI integration. The sectors that will be influenced the most by AI are education and healthcare, but in the short term, the first industries will be artists and some office workers.

Let’s continue with Hardware. Nvidia will stay the leader in AI hardware with H200 and later this year with B100. Many companies will use their custom chips like Microsoft, Apple, and Google, but the demand will lead to increased sales for every chip company. At the end of 2024, more than half of the global flops will be used for AI. VR Hardware will continue to improve, and we will finally see the first useful everyday AR glasses towards the end of 2024. Quantum computers will become part of some of the cloud providers and will be offered as specialized hardware just like GPUs (Note: This part was written before the AWS Event announcement). They will become more relevant for many industries as the number of Qbits grows. We will also see more variety in chips as they become more specialized to save energy. Brain-computer interfaces will finally be used in humans for actual medical applications.

I did not make any predictions about robots last year, because there weren’t many exciting developments, but that changed. Multiple companies started developing humanoid robots that will be ready in 2024 or 2025. I expect an initial hype around them and adoption in some areas. However, towards the end of the decade they will be replaced with special-purpose robots and humanoid robots will be limited to areas where a human form factor is needed. In general, the amount of Robots will increase in all areas. Progress in planning and advanced AI allows for robots to act in unknown environments and do new tasks. They will leave controlled environments like factories and will appear in, shops, restaurants, streets, and many other places.

The robots: Atlas by Boston Dynamics, Digit by Agility Robotics, and Tesla Optimus by Tesla

Let’s continue with energy. The transition to renewable energy will accelerate in 2024, with a significant focus on solar. The first commercial fusion reactor will begin construction, and nuclear reactors will become even safer, mostly solving the waste problem. More people will build solar for their own houses and become most self-sufficient.

I mentioned LK99 earlier already so here are my predictions for material science. I think that if a room-temperature superconductor is possible, an AI-based system will find it in the next two years. In fact, most new materials will be hypothesized and analyzed by AI and will bring a lot of progress for areas like batteries, solar panels, and other material-dependent fields (Note: this part was written four days before Deepmind presented GNoME).

Biology and medicine are poised to make significant leaps, powered by AI systems like Alphafold and similar technologies. Cancer and other deadly diseases will become increasingly treatable and aging will become a target for many in the field. The public opinion that aging is natural and cannot/should not be stopped will not change this year but maybe in 2025. Prostheses will become more practical and will be connected directly to nerves and bones. This will make them in some areas better than human parts, but touch and precision will continue to be way worse. We will also see progress in artificial organs grown in animals or completely made in a lab.

Transportation in 2024 will change slightly. EVs will become more popular and cheaper but will not reach the level of adaptation that they have in China. Self-driving cars will stay in big cities as taxi replacements and will not be generally available until 2025. Hypertubes will not become a train replacement and will only be built for very specific connections if they get built at all in the next few years.

Other infrastructures like the Internet will continue to stay behind the demand for the next few years. The main driver of the increased need for bandwidth will be high-quality video streaming while the main need for speed will arise from interactive systems like cloud-based AI assistants.

Climate change and unstable governments will lead to an increase in refugees worldwide and social unrest will increase. We will see the first effects of AI-induced Job losses. The political debate will become more heated and some important elections like the US election will be fully determined by large-scale AI-based operations that use Fake news, Deepfakes, and online bots to control the public opinion.

I made a lot more verifiable predictions this time and I hope to see how much I got correct. If I missed any area or technology write them in the comments and I will add a prediction in the comments. Also, let me know your predictions.

From GPT-4 to Proto-AGI

Deutsche Version

Artificial General Intelligence (AGI) is the ultimate goal of many AI researchers and enthusiasts. It refers to the ability of a machine to perform any intellectual task that a human can do, such as reasoning, learning, creativity, and generalization. However, we are still far from achieving AGI with our current AI systems. One of the most advanced AI systems today is GPT-4, a large multimodal model created by OpenAI that can take text and pictures as input and outputs text. So how far away from AGI is GPT-4 and what do we need to do to get there?

What GPT-4 is capable of?

GPT-4 is a successor of GPT-3.5, which was already impressive in its ability to generate coherent and fluent text on various topics and domains. GPT-4 improves on GPT-3.5 by being more reliable, creative, and able to handle much more nuanced instructions than its predecessor. For example, it can pass a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. It also generates medium-sized working programs and can reason to a certain extent. The context window of GPT-4 is 32K tokens which allows it to produce entire programs.

Comparison between 3.5 and 4 on different Exams. Taken from the GPT-4 paper

GPT-4 also adds a new feature: visual input. It can accept image and text inputs together and emit text outputs that are relevant to both modalities. For instance, it can describe what is happening in an image or understand its relevance in a given context. This makes GPT-4 more versatile and useful for various applications that require multimodal understanding.

However, despite its impressive capabilities, GPT-4 is still far from being able to perform all the tasks that humans can do with language and images. It still lacks some crucial components that are necessary for achieving AGI.

What do we need to add?

One of the main limitations of GPT-4 is that it has no memory. It cannot remember what it has said, outside of its context window, or learned before, and cannot use it for future reference or inference. This means that it cannot build long-term knowledge or relationships with its users or other agents. It also means that it cannot handle complex reasoning tasks that require multiple steps or facts that exceed its context window.

Another limitation of GPT-4 is that it has no access to tools that can help it solve problems or learn new skills. For example, it cannot use the Internet to search for information on the web; Wolfram Alpha to compute mathematical expressions; databases to store and retrieve data; or other APIs to interact with external services. This limits its ability to acquire new knowledge or perform tasks beyond outputting text.

A third limitation of GPT-4 is that it has no inner thinking. It is strictly an input-output machine that produces exactly one piece of text for every input it gets. In between inputs it does nothing and is in the same state every time. The ability to simulate possible situations is called mental simulation and is one of the key abilities of the human brain. It is a fundamental form of computation in the brain, underlying many cognitive skills such as mindreading, perception, memory, and language. The fact that all Transformer based AI systems are not capable of that in their current form, is, in my opinion, the main reason why AGI is still not in sight.

How do we do this?

To overcome these limitations and move closer towards AGI, we need to add some features and functionalities to GPT-4 that can substitute for these shortcomings.

One possible way to do this is by using chain prompts. Chain prompts are sequences of inputs and outputs that guide the model through a series of steps or actions towards a desired goal. For example, we can use chain prompts to instruct GPT-4 to search for information on the Internet. By using chain prompts, we can extend GPT-4‘s capabilities and make it more powerful and transparent. Instead of giving the Model the input directly, we would ask it which parts of the input it needs more information on, and then we get a list of keywords selected by the model that we feed into a search engine. In the last step, we add the information that we got to the original input and give the user the final output.

Another possible way to do this is by using Toolformer. Toolformer was proposed by Meta that allows us to integrate external tools into LLMs by using special tokens that represent tool names. The model would be fine-tuned on text examples of API calls. For example, we can use Toolformer to write:
Input: What is 2 + 2?
Output:The answer is <calculator args=”2+2″>4</calculator>.
This way, GPT-4 can learn to use tools by observing how they are used in natural language contexts. Toolformer can also handle complex tool compositions and nested tool calls. Some tools that would drastically enhance the capabilities of GPT are

Wolfram Alpha (Math)

A calendar (temporal awareness),

A search engine (information gathering)

A database(memory)

A command line (general control)

Especially the last part is really special. By giving a powerful enough model access to a computer, and combining this with other methods such as chain prompting, we could enable unlimited possibilities.
One special case of these techniques that I want to highlight is code execution. An LLM that can run generated code itself and receive the output could build the programs to solve every task it gets. This starts with writing simple functions to solve equations to controlling a smart home or fine-tuning itself.

We can also add memory this way by giving it access to a database. We could use chain prompting to ask the model if parts of the input or output should be saved for the future and combine it with a writing call to the database. We then could use embeddings to search the database for every input and extract relevant information. Embeddings are vector representations of text that decode the meaning of the text. Asking the model about an appointment with your doctor would be represented by a vector that is similar to the vector that represents the information about the appointment in the database. The solution is not perfect but would add memory to the model.

Embeddings as memory. Image from https://medium.com/@jeremyarancio/create-your-document-chatbot-with-gpt-3-and-langchain-8eeb66b98656

Where we are right now

We already see the start of these augmentations. The first one was BingGPT which augments GPT-4 with a search engine. The most recent and impressive one is Microsoft’s copilot for Microsoft 365, which combines GPT-4 with all the Office tools and their Microsoft Graph system, which also gives it access to all your documents. Other companies will follow even though the integration is limited since the model is not Open source and OpenAI are the only ones able to fine-tune it. But for most of these techniques, you can use Langchain which is a new code library that contains many of the described ways to improve GPT.4

What we could see until the end of the year

All these methods are not mutually exclusive and can be combined in different ways depending on the task and context. Many companies are already or going to integrate GPT-4 into their products. And the more tools can be controlled by natural language the easier it will be for other LLMs to use them. Until the end of the year, we will see Language Models talking to each other. I can see a near future where we have our own custom model that talks to BingGPT, Copilot, or other software and takes on the role of a dirigent of other instances of GPT-4. But there are also risks. Giving the model too much control could lead to chains of mistakes if the model is not powerful enough and makes mistakes or it could lead to a complete takeover and fast takeoff if future models like GPT-5 or 6 are too powerful. This is unlikely as long as OpenAI holds tight control over the development and execution of these models, but the competition is growing and broadly available Hardware and software are becoming better and better. This year will be the rise of AI and next year could be the birth year of proto-AGI.

Update: shortly after I finished this post, this paper was released. It talks about a form of memorizing transformer, which I found to be quite relevant to this post.

German version below

Von GPT-4 zu Proto-AGI

Artificial General Intelligence (AGI) ist das ultimative Ziel vieler AI-Forscher und Enthusiasten. Es bezieht sich auf die Fähigkeit einer Maschine, jede geistige Aufgabe auszuführen, die ein Mensch tun kann, wie etwa das Denken, Lernen, Kreativität und Generalisierung. Allerdings sind wir noch weit davon entfernt, AGI mit unseren derzeitigen AI-Systemen zu erreichen. Eines der fortschrittlichsten AI-Systeme aktuell ist GPT-4, ein großes multimodales Modell, dass von OpenAI erstellt wurde und Text und Bilder als Eingabe nimmt und Text als Ausgabe produziert. Also wie weit ist GPT-4 von AGI entfernt und was müssen wir tun, um dorthin zu gelangen?

Was kann GPT-4?

GPT-4 ist der Nachfolger von GPT-3.5, dass bereits beeindruckend ist in seiner Fähigkeit, zusammenhängenden und flüssigen Text zu verschiedenen Themen und Domänen zu generieren. GPT-4 verbessert GPT-3.5, indem es zuverlässiger, kreativer und in der Lage ist, viel nuanciertere Anweisungen als sein Vorgänger zu handhaben. Zum Beispiel kann es eine simulierte Bar-Prüfung mit einer Punktzahl um die Top 10% der Testteilnehmer bestehen; im Gegensatz dazu lag die Punktzahl von GPT-3.5 bei rund 10% am unteren Ende. Es generiert auch mittelgroße funktionierende Programme und kann bis zu einem gewissen Grad schlussfolgern. Das Kontextfenster von GPT-4 umfasst 32 tausend Token, was es ermöglicht, ganze Programme zu erstellen.

Vergleich zwischen 3,5 und 4 in verschiedenen Tests. Entnommen von dem GPT-4 paper.

GPT-4 fügt auch eine neue Funktion hinzu: visuelle Eingabe. Es kann sowohl Bild- als auch Texteingaben akzeptieren und Textausgaben liefern, die für beide Modalitäten relevant sind. Zum Beispiel kann es beschreiben, was in einem Bild passiert, oder den inhalt eines Bildes in einen Kontext einzuordnen. Dies macht GPT-4 vielseitiger und nützlicher für verschiedene Anwendungen, die ein multimodales Verständnis erfordern.

Was noch fehlt?

Trotz seiner beeindruckenden Fähigkeiten ist GPT-4 jedoch noch weit davon entfernt, alle Aufgaben ausführen zu können, die Menschen mit Sprache und Bildern bewältigen können. Es fehlen noch einige wesentliche Komponenten, die für die Erreichung von AGI notwendig sind.

Eine der Hauptbeschränkungen von GPT-4 ist, dass es kein Gedächtnis hat. Es kann sich nicht daran erinnern, was es gesagt hat, außerhalb seines Kontextfensters oder was es zuvor gelernt hat, und kann es nicht für zukünftige Referenzen oder Rückschlüsse verwenden. Dies bedeutet, dass es kein langfristiges Wissen oder Beziehungen zu seinen Benutzern oder anderen Agenten aufbauen kann. Es bedeutet auch, dass es keine komplexen Denkaufgaben bewältigen kann, die mehrere Schritte erfordern oder Fakten überschreiten, die sein Kontextfenster übersteigen. Eine weitere Einschränkung von GPT-4 ist, dass es keinen Zugang zu Tools hat, die ihm helfen können, Probleme zu lösen oder neue Fähigkeiten zu erlernen. Es kann z.B. nicht das Internet nutzen, um nach Informationen im Web zu suchen; Wolfram Alpha zur Berechnung mathematischer Ausdrücke; Datenbanken zur Speicherung und Abfrage von Daten oder andere APIs zur Interaktion mit externen Diensten. Dies begrenzt seine Fähigkeit, neues Wissen zu erwerben oder Aufgaben jenseits des Textausgabe zu erledigen. Eine dritte Einschränkung von GPT-4 ist, dass es kein inneres Denken hat. Es ist streng genommen eine Input-Output-Maschine, die für jede Eingabe genau ein Textstück produziert. Zwischen den Eingaben tut es nichts und ist jedes Mal im gleichen Zustand. Die Fähigkeit, mögliche Situationen zu simulieren, wird als mentale Simulation bezeichnet und ist eine der Schlüsselkompetenzen des menschlichen Gehirns. Sie ist eine grundlegende Form der Berechnung im Gehirn und liegt vielen kognitiven Fähigkeiten wie Gedankenlesen, Wahrnehmung, Gedächtnis und Sprache zugrunde. Die Tatsache, dass alle auf der Transformer-Technologie basierenden KI-Systeme in ihrer derzeitigen Form dazu nicht in der Lage sind, ist meiner Meinung nach der Hauptgrund, warum AGI noch nicht in Sicht ist.

Wie können wir das erreichen?

Um diese Einschränkungen zu überwinden und uns der AGI näher zu bringen, müssen wir GPT-4 einige Funktionen und Eigenschaften hinzufügen, die diese Mängel ausgleichen können. Eine mögliche Methode dafür sind sogenannte “Chain Prompts“. Chain Prompts sind Folgen von Eingaben und Ausgaben, die das Modell durch eine Reihe von Schritten oder Aktionen hin zu einem gewünschten Ziel führen. Zum Beispiel können wir Chain Prompts verwenden, um GPT-4 anzuweisen, im Internet nach Informationen zu suchen. Mit Chain Prompts können wir die Fähigkeiten von GPT-4 erweitern und es leistungsfähiger und transparenter machen. Anstatt dem Modell die Eingabe direkt zu geben, würden wir es fragen, welche Teile der Eingabe mehr Informationen benötigen, dann bekommen wir eine Liste von Schlüsselwörtern, die vom Modell ausgewählt wurden und die wir in eine Suchmaschine eingeben. Im letzten Schritt fügen wir die erhaltenen Informationen der ursprünglichen Eingabe hinzu und geben dem Benutzer die endgültige Ausgabe.

Eine weitere mögliche Methode hierfür ist die Verwendung von Toolformer. Toolformer wurde von Meta entwickelt und ermöglicht uns, externe Tools in LLMs zu integrieren, indem wir spezielle Tokens verwenden, die Toolnamen darstellen. Das Modell würde mit Textbeispielen von API-Aufrufen verfeinert werden. Zum Beispiel können wir Toolformer verwenden, um Folgendes zu schreiben:

Eingabe: What is 2 + 2?
Ausgabe: The answer is <calculator args=”2+2″>4</calculator>.

Auf diese Weise kann GPT-4 lernen, Tools zu verwenden, indem es beobachtet, wie sie in natürlichen Sprachkontexten verwendet werden. Toolformer kann auch komplexe Toolzusammensetzungen und verschachtelte Toolaufrufe verarbeiten. Einige Tools, die die Fähigkeiten von GPT drastisch verbessern würden, sind

Wolfram Alpha (Mathematik)

Kalender (zeitliche Kenntnisse)

Suchmaschine (Informationsbeschaffung)

Datenbank (Speicher)

Commandozeile (generelle Kontrolle).

Besonders der letzte Punkt ist sehr wichtig. Indem wir einem ausreichend mächtigen Modell Zugang zu einem Computer geben und dies mit anderen Methoden wie Chain Prompting kombinieren, könnten wir unbegrenzte Möglichkeiten eröffnen. Ein spezieller Fall dieser Techniken, den ich hervorheben möchte, ist die Ausführung von Code. Ein Sprachmodel, das generierten Code selbst ausführen und die Ausgabe empfangen kann, könnte Programme zum Lösen jeder Aufgabe erstellen. Dies beginnt mit dem Schreiben einfacher Funktionen zur Lösung von Gleichungen bis hin zur Steuerung eines Smart Homes oder der eigenen Verbesserung.

Auf diese Weise können wir dem Modell auch Zugriff auf eine Datenbank geben, um so den Speicher zu erweitern. Wir könnten Chain Prompting nutzen, um das Modell zu fragen, ob Teile der Eingabe oder Ausgabe für die Zukunft gespeichert werden sollen, und es mit einem Schreibbefehl an die Datenbank kombinieren. Anschließend könnten wir Embeddings verwenden, um die Datenbank nach jeder Eingabe zu durchsuchen und relevante Informationen zu extrahieren. Embeddings sind Vektor-Textdarstellungen, die die Bedeutung des Textes entschlüsseln. Wenn wir das Modell beispielsweise nach einem Termin mit unserem Arzt fragen, wird ein Vektor erstellt, der ähnlich dem Vektor ist, mit dem die Informationen über den Termin in der Datenbank dargestellt werden. Die Lösung ist zwar nicht perfekt, würde aber dem Modell Gedächtnis hinzufügen.

Embeddings als Gedächnis des Models. Bild von https://medium.com/@jeremyarancio/create-your-document-chatbot-with-gpt-3-and-langchain-8eeb66b98656

Der aktuelle Stand

Wir sehen bereits den Beginn dieser Erweiterungen. Die erste war BingGPT, die GPT-4 mit einer Suchmaschine erweitert. Die neueste und beeindruckendste ist Microsofts Copilot für Microsoft 365, eine Kombination aus GPT-4 und allen Office-Tools sowie ihrem Microsoft Graph-System, das auch Zugriff auf alle deine Dokumente gibt. Andere Unternehmen werden folgen, obwohl die Integration begrenzt ist, da das Modell nicht Open Source ist und nur OpenAI es feinabstimmen kann. Besonders hervorheben möchte ich Langchain eine code bibliothek die viele der hier beschrieben Techniken seh vereinfacht

Was noch diesen Jahr passieren kann

All diese Methoden schließen einander nicht aus und können je nach Aufgabe und Kontext auf unterschiedliche Weise kombiniert werden. Viele Unternehmen integrieren bereits oder werden GPT-4 in ihre Produkte integrieren. Und je mehr Werkzeuge von natürlicher Sprache gesteuert werden können, desto einfacher wird es für andere LLMs sein, sie zu nutzen. Bis Ende des Jahres werden wir sehen, wie Sprachmodelle miteinander sprechen. Ich kann mir eine nahe Zukunft vorstellen, in der wir unser eigenes benutzerdefiniertes Modell haben, das mit BingGPT, Copilot oder anderen Software spricht und die Rolle eines Dirigenten für andere Instanzen von GPT-4 übernimmt. Aber es gibt auch Risiken. Wenn das Modell zu viel Kontrolle erhält und nicht leistungsstark genug ist wird es Fehler machem, welche zu Ketten von Fehlern führen können, oder anders herum könnte es zu einem vollständigen Kontrollverlust der Menschen und einer explosionsartigen Entwickung von künstlicher Intelligenz kommen, wenn zukünftige Modelle wie GPT-5 oder 6 zu leistungsfähig sind. Dies ist unwahrscheinlich, solange OpenAI eine strenge Kontrolle über die Entwicklung und Ausführung dieser Modelle ausübt, aber der Wettbewerb wächst und die allgemein verfügbare Hardware und Software werden immer besser. Dieses Jahr wird das Aufkommen von KI sein und nächstes Jahr könnte das Geburtsjahr von Proto-AGI sein.

Da ich nach einer deutschen Version der Posts gefragt wurde ist dies mein erster Versuch Posts zweisprachig zu machen. Ich freue mich über Feedback und kann auf Wunsch auch gerne noch einzelne ältere Posts übersetzen. (Die Übersetzung ist von GPT-3.5 und enhält sprachliche Fehler und suboptimale Formulierungen.)

Update: kurz nachdem ich diesen Aritkel fertig hatte, wurde dieses Paper veröffentlicht. Es geht um eine Form von Transformer mit Gedächnis, was sehr relevant für diesen Artikel ist.

Large Language Models: An Overview

Large Language Models (LLMs) are machine learning-based tools that are able to predict the next word in a given sequence of words. In this post, I want to clarify what they can and cannot do, how they work, what their limitations will be in the future, and how they came to be.

History

With the recent surge in public awareness surrounding Large Language Models (LLMs), a discourse has arisen concerning the potential benefits and risks associated with this technology. Yet, for those well-versed in the field of machine learning, this development represents the next step in a long-standing evolutionary process that began over half a century ago. The first language models were developed over 50 years ago and used statistical approaches that were barely able to form correct sentences.

With the rise of deep learning architectures like recurrent neural networks (RNN) and Long-Short-Term Memory (LSTM) neural networks, they became more powerful but also started to grow in size and needed data. 

The emergence of GPUs, and later on specialized processing chips called TPUs, facilitated the construction of larger models, with companies such as IBM and Google spearheading the creation of translation and other language-related applications. 

The biggest breakthrough was in 2017 when the paper “Attention is all you need” by Google introduced the Transformer. The Transformer model used self-attention to find connections between words independent of their position in the input and was, therefore, able to learn more complex dependencies. It was also more efficient to train which meant it could train on larger data sets. OpenAI used the Transformer to build GPT-2, the most powerful language model at its time, which developed some surprising capabilities which led to the idea that scaling these models up would unlock even more impressive capabilities. Consequently, many research teams applied the Transformer to diverse problems, training numerous models of increasing size, such as BERT, XLNet, ERNIE, and Codex, with GPT-3 being the most notable. However, most of these models were proprietary and unavailable to the public. This has changed with recent releases like Dall-E for image generation and GitHub Copilot. Around this time it became clear that scaling language models up became less effective and too expensive for most companies. This was confirmed by Deepmind in 2022 in their research paper “Training Compute-Optimal Large Language Models” which showed that most LLMs are vastly undertrained and too big for their training data set.

OpenAI and others started to use other means to improve their models, such as reinforced learning. That led to InstructGPT which was fine-tuned to perform the described tasks. They used the same technique to fine-tune their model on dialog data which led to the famous ChatGPT. 

How they work

The core of most modern machine learning architectures are neural networks. As the name suggests, they are inspired by their biological counterpart. 

Simple neural network with 2 input nodes, 5 hidden nodes, and 1 output node
Simple neural network with 2 input nodes, 5 hidden nodes, and 1 output node

At a high level, a neural network consists of three main components: an input layer, one or more hidden layers, and an output layer. The input layer receives data, which is then processed through the hidden layers. Finally, the output layer produces a prediction or classification based on the input data.

The basic building block of a neural network is a neuron, which takes inputs, applies a mathematical function (activation function) to them, and produces an output. The output of each neuron ni is multiplied by the weight wij and added together into the neuron nj in the next layer until the output layer is reached. This process can be implemented as a simple matrix-vector multiplication with the input as the vector I and the weights as the matrix W: WxI = O, where O is the output vector which is used as the input for the next layer where we apply the activation function f(O) = I until the final output.

During training, the network is presented with a set of labeled examples, known as the training set. The network uses these examples to learn patterns in the data and adjust its internal weights to improve its predictions. The process of adjusting the weights is known as backpropagation.

Backpropagation works by calculating the error between the network’s output and the correct output for each example in the training set. The error is then propagated backwards through the network, adjusting the weights of each neuron in the opposite direction of the error gradient. This process is repeated for many iterations until the network’s predictions are accurate enough.

Since 2017 most LLMs are based on Transformers. Which also contain simple feed-forward networks, but at their core have a self-attention mechanism that allows the Transformer to detect dependencies between different words in the input.

Classic Transformer block

The self-attention mechanism in the Transformer model works by using three vectors for each element of the input sequence: the query vector, the key vector, and the value vector. These vectors are used to compute an attention score for every element in the sequence. We get the score of the jth element by calculating the dot product of the query vector with every key vector ki of every element and multiplying the result with the value vector vi. We then sum up all the results to get the output.

Based on a graphic by peter bloem

Before you multiply the attention score with the value vectors, you would first apply a softmax function to the attention scores. This will ensure that they add up to one and that the resulting weighted value vectors are weighted proportionally to their relevance to the query element. This weighted sum is then used as input to the next layer of the Transformer model. I skipped or simplified other parts of the algorithm as well to make it easier to understand. For a more in-depth explanation of Transformers, I recommend this blog or the creator of GPT himself.

The self-attention mechanism in the Transformer model allows the model to capture long-range dependencies and relationships between distant elements in the input sequence. By selectively attending to different parts of the sequence at each processing step, the model is able to focus on the most relevant information for the task at hand. This makes the Transformer architecture highly effective for natural language processing tasks, where capturing long-range dependencies is crucial for generating coherent and meaningful output. 

What they can and cannot do

As explained earlier LLMs are text prediction systems. They are not able to “think”, “feel”, or “experience” anything, but are able to learn complex ideas to be able to predict text accurately. For example the sequence “2 + 2 =” can only be continued if there is an internal representation of basic math inside the Transformer. This is also the reason why LLMs often produce plausible-looking output that makes sense but is wrong. Since the model is multiple magnitudes smaller than the training data and even smaller compared to all possible inputs it is not possible to represent all the needed data. This means that LLMs are great for producing high-quality text about a simple topic, but they are not great at understanding complex problems that require a huge amount of available information and reasoning like mathematical proofs. This can be improved by providing needed information in the input sequence which will increase the probability of correct outputs. A great example would be BingGPT which uses search queries to get additional information about the input. You can also train LLMs to do this themselves by fine-tuning them on API calls.

What will they be able to do and what are the limits

The chinchilla scaling law shows that LLMs are able to adapt to even larger amounts of training data. If we can collect the needed amount of high-quality text data and processing power LLMs will be able to learn even more complex language-related tasks and will become more capable and reliable. They will never be flawless on their own and have the core problem that you are never able to understand how the output was produced as neural networks are black boxes for an observer. They will however become more general as they learn to use pictures, audio, and other sensory data as input, at which point they are barely still language models. The Transformer architecture however will always be a token prediction tool and will never develop “consciousness” or any kind of internal thought as they are still just several Matrix calculations on a fixed input. I suspect that we need at least some internal activity, and the ability to learn during deployment for AGI. But even without that, they will become part of most professions, hidden inside other applications like Discord, Slack, or Powerpoint.

Bias and other problems

LLMs are trained on large text corpora which are filled with certain views, opinions, and mistakes. The resulting output is therefore flawed. The current solution includes blocking certain words from input and/or output. Fine-tune with human feedback, or provide detailed instructions and restrictions in every prompt. They are all not flawless as blocking words is not precise enough. Added instructions can be circumvented by simply overwriting them with prompt injections. Fine-Tuning with human feedback is the best solution that comes with its own problem which is that the people who rate the outputs include their own bias in the fine-tuned model. This becomes a huge problem if you start using these models in education, communication, and other use cases. The views of the group of people who are controlling the training process are now projected onto everybody in the most subtle and efficient way imaginable. As OpenAI stated in their recent post the obvious solution will be to fine-tune your own model, which will lead to less outside influence but also increases the risk of shutting out other views and could create digital echo chambers where people put their radical beliefs into models and are getting positive feedback.

Another problem is that most people are not aware of how these systems work and terms like “artificial intelligence” suggest some form of being inside the machine. They start to anthropomorphize them and accept the AI unconsciously as another person. This is because our brains are trained to look at language as something only an intelligent being can produce. This starts by adding things like “thanks” to your prompt and then moves quickly to romantic feelings or some other kind of emotional connection. This will become increasingly problematic the better and more fine-tuned the models become. Adding text-to-speech and natural language understanding will also amplify this feeling.

Scaling

I see many people asking for an open-source version of chatGPT and wishing to have such a system on their computers. Compared to generative models like stable diffusion, LLMs are way bigger and more expensive to run. This means that they are not viable for consumer hardware. It takes millions of dollars in computing power to train and is only able to run on large servers. However, there are signs that this could change in the future. The Chinchilla scaling law implies that we can move a larger part of the computation into the training process by using smaller models with more data. An early example would be the new LLaMA models by Meta which are able to run on consumer hardware and are comparable to the original GPT-3. This still requires millions in training, but this can be crowdfunded or distributed. While these language models will never be able to compete with the state-of-the-art models made by large companies, they will become viable in the next 1-2 years and will lead to personalized fine-tuned models that take on the role of an assistant. Two excellent examples of open-source projects that try to build such models are “Open-Assistant” and “RWKV“.

taken from the paper “Compute Trends Across Three Eras of Machine Learning

The current growth in computing will not be sustainable much longer as it is not only driven by Moore’s law, but also by an increase in investments in training which will soon hit a point where the return does not justify the costs. at this point, we will have to wait for the Hardware to catch up again.

What are the main use cases?

When ChatGPT came out, many used it like Google to get answers to their questions. This is actually one of the weak points of LLMs since they can only know what was inside their training data. They tend to get facts wrong and produce believable misinformation. This can be fixed by including search results like Bing is doing.

The better use case is creative writing and other text-based tasks like summarising, explaining, or translating. The biggest change will therefore happen in jobs like customer support, journalism, and teaching. The education system in particular can benefit greatly from this. In many countries, Germany for example, teachers are in need. Classes are getting bigger and lessons are less effective. Tools like ChatGPT are already helping many students and when more specialized programs use LLMs to provide a better experience they will outperform traditional schools soon. Sadly many schools try to ban ChatGPT instead of including it which is not only counterproductive but is also not possible since there are no tools that can accurately detect AI-written text. But text-based tasks are not the limit. Recent papers like Toolformer show that LLMs will soon be able to control and use other hard and software. This will lead to numerous new abilities and will enable them to take over a variety of new tasks. A personal assistant as Apple promised us years ago when they released Siri will soon be a reality.

Looking Back On 2022 And Predictions For 2023

2022 was an eventful year with lots of ups and downs. While the global economy is struggling, and problems like climate change and social instability continue to grow, there have also been some significant technological and scientific breakthroughs.

The most prominent developments probably happened in deep learning with the appearance of generative models that are able to generate human-level music, art, dialog, and code. In this context, I want to talk about two specific papers that shaped the field this year and most likely next year. The paper “Denoising Diffusion Probabilistic Models”  which is the basis for Dall-E 2, Stable diffusion, and many other generative models, and the chinchilla paper from Deepmind, which demonstrated the importance of high-quality training data over model size. This will likely shape the design and cost of future models, including the anticipated release of OpenAI’s GPT-4 in 2023, which is expected to outperform humans in many text-based tasks. The improvements are not only driven by Moore’s law and architectural improvements but also the money spent to train and develop these systems increases. This is expected as the potential is more and more recognized and the value these systems provide is ever-increasing.

Note that this is a logarithmic chart. the growth is nearly double exponential.

But not just GPT-4. AI will continue to disrupt various industries such as search and creative writing and spark public debate about its impact, even more than is happening right now. It will also lead to the production of high-quality media with fewer people and resources thanks to AI’s assistance. In the field of 3D generation, I expect to see similar progress in 2023, bringing us closer to the quality of 2D generation.

Fusion, the process of combining atomic nuclei to release a large amount of energy, has made significant strides in recent years. This is largely due to the incorporation of machine learning and advancements in various fields such as materials science and engineering. Recently, the U.S. Department of Energy announced that they were able to achieve a positive net outcome from a fusion reaction, which is a major milestone in the pursuit of unlimited clean energy. While I expect to see continued progress in this field, it is unlikely that we will see a commercial fusion reactor within the next two years. However, the upcoming start of the Iter project, an international collaboration to build a fusion reactor, may refuel interest and drive further developments in this promising area.

The James Webb Space Telescope (JWST) is an important milestone in the field of astronomy because it is designed to be the most powerful and advanced space telescope ever built. It started to operate this year. It is a collaboration between NASA, the European Space Agency (ESA), and the Canadian Space Agency (CSA). One of the main goals of the JWST is to study the early universe and the formation and evolution of galaxies. It will be able to observe some of the most distant objects in the universe, including the first stars and galaxies that formed after the Big Bang. In addition to studying the early universe, the JWST will also be able to observe exoplanets (planets outside of our solar system) and potentially search for signs of life on these planets. It will have the ability to study the atmospheres of exoplanets and look for biomarkers, such as oxygen and methane, which could indicate the presence of life. The JWST is also expected to make important contributions to our understanding of planetary science, by studying the atmospheres and surfaces of planets in our own solar system and beyond.

The James Webb Space Telescope (JWST)

The hardware industry has faced challenges this year due to manufacturing bottlenecks. Despite the continuation of Moore’s law and the development of new alternatives to silicon, it has been difficult to obtain chips at this time. The industry is restructuring in order to better handle future demand for hardware. Specialized hardware, such as AI processors and quantum computers, are seeing rapid development. According to IBM’s roadmap, we can expect to see quantum computers with over 1000 Qbits in the upcoming year. GPUs will become more important with the rise of AI. However, these advancements in hardware technology also come with the need for careful consideration and planning in terms of production and distribution. Ensuring a stable and efficient supply chain will be crucial in meeting the increasing demand for these specialized hardware components.

Virtual Reality (VR) technology has experienced a difficult period in recent years due to overhyping of its potential. While some people may have expected VR to revolutionize the way we interact with and experience the world, it has yet to reach the level of ubiquity and practicality that was promised by Meta. But the year 2023 is shaping up to be a promising one for the VR hardware market, with multiple new headsets, such as the Quest 3, and maybe even an Apple Headset, set to be released. These new products will likely offer improved graphics, more intuitive controls, and a wider range of content and experiences. While it may not fully realize the vision of a “Metaverse”, VR is still likely to be a great entertainment product for many people

2023 will be a critical year for AR. It will be the first time that we can build affordable Hardware in a small form factor. Chips like the Snapdragon AR2 Gen 1 implement Wifi 7 and low energy usage and will make it possible to build Smart glasses. Depending on the availability and price of the chips and other components I expect glasses from many different companies with even more capabilities than Oppo air Glass 2.

One of the most exciting developments in computer interfaces is the emergence of brain-computer interfaces (BCIs). These allow for direct communication between the brain and a computer, enabling the possibility of controlling devices with thought alone. While companies like Neuralink are claiming to begin human trials next year, non-invasive BCIs present a much lower barrier to entry and are being actively developed by startups such as Synchron, which has received significant funding. AI will also help the field by decoding brain signals. It is likely that we will see at least one viral video showcasing the capabilities of these non-invasive BCIs, similar to the viral video of a monkey playing pong using a BCI that was released last year. The potential applications for BCIs are vast and diverse, ranging from medical and therapeutic uses to gaming and everyday tasks. As these technologies continue to evolve, it is exciting to consider the possibilities for the future of human-computer interaction.

Researchers from biotech and other fields were able to develop an mRNA vaccine for COVID-19 in less than a year. The same technology was also used to create a universal flu vaccine and a vaccine for malaria. The combination of biology and AI has yielded promising results in the development of treatments for various viruses and illnesses. For example, a team led by Chris Jones of the Institute of Cancer Research used AI tools to identify a new drug combination to fight diffuse intrinsic pontine glioma, a type of incurable childhood brain cancer. The proposed combination extended survival in mice by 14% and has been tested in a small group of children. Additionally, Dr. Luis A. Diaz Jr. of Memorial Sloan Kettering Cancer Center published a paper in the New England Journal of Medicine describing a treatment that resulted in complete remission in all 18 rectal cancer patients who took the drug. Overall, the progress in the field is accelerating thanks to advancements in AI, such as Alphafold 2, which are designed to find and develop treatments for various diseases. If this continues we will be able to beat cancer in the next few years, which leads to the next field.

I predict that every person under 60 has the potential to live forever, as I mentioned in my post about longevity escape velocity. The field of aging research has made significant progress in recent years and is more confident than ever in its understanding of the aging process and life itself. For example, researchers at the Weizmann Institute of Science in Israel were able to create fully synthetic mouse embryos in a bioreactor using stem cells cultured in a Petri dish, without the use of an egg or sperm. These embryos developed normally, starting to elongate on day three and developing a beating heart by day eight. This marked a major advancement in the study of how stem cells form different organs and how mutations can cause developmental diseases. This is a promising step toward the end goal: Achieving complete control over all biological processes in the body.

While this was a slow year in some aspects, major progress was made in most fields, and 2023 will be even faster. We are at the knee of an exponential blowup and we are not ready for what is coming. While I am still worried about how society will react and adapt, I am excited for 2023 and the rest of the decade.

The Future of Personal AI: Opportunities and Challenges

Personal AI, or artificial intelligence designed to assist individuals in their daily lives, is becoming increasingly common and advanced. From virtual assistants like Siri and Alexa, to smart home devices like thermostats and security cameras, AI is changing the way we interact with the world around us.

As technology continues to evolve, it is important to consider the opportunities and challenges that personal AI presents, and how it will shape our future. One of the biggest opportunities of personal AI is the ability to automate and streamline tasks, freeing up time and mental energy for more important or enjoyable activities.  For example, a personal AI assistant can help manage your schedule, remind you of important appointments, and even make recommendations for things like restaurants or events based on your preferences and interests.  This can make it easier to stay organized and efficient and can allow you to focus on the things that matter most to you. Another opportunity of personal AI is the ability to customize and personalize your experience.  With advanced machine learning algorithms, personal AI can learn your habits and preferences over time and can tailor its recommendations and responses accordingly.  This can make your interactions with personal AI more natural and intuitive and can help you get the most out of the technology.

However, personal AI also presents some challenges that need to be considered. One of the biggest challenges is the potential for data privacy concerns. As personal AI collects more and more data about you and your habits, there is a risk that this data could be misused or accessed by unauthorized parties.

This could result in a violation of your privacy and could even put your personal information at risk. As personal AI becomes more prevalent, it will be important to address these concerns and develop robust privacy protections to ensure that individuals’ data is safe and secure. Another challenge of personal AI is the potential for bias and discrimination.  AI algorithms are only as good as the data they are trained on, and if the data is biased, the AI will be biased as well. This could result in unfair or unequal treatment of certain individuals or groups and could even perpetuate existing biases and stereotypes.

To address this challenge, it will be important to carefully curate and balance the data used to train personal AI algorithms, and to regularly evaluate and test the algorithms for potential bias. Overall, the future of personal AI holds great potential for improving our daily lives and making our interactions with technology more natural and intuitive. However, it is important to carefully consider the opportunities and challenges of personal AI and to address any potential risks or concerns to ensure that the technology is used responsibly and ethically.

Up until now, the entire article was written by ChatGPT without any nitpicking or corrections.

ChatGPT is an aligned and finetuned version of GPT-3.5 from OpenAI and is free to use for the last 2 weeks on their website. It is so popular that it reached over a million users in the first few days and since then OpenAI can barely keep the server running. This is not surprising since it is free, easy to use, and there are infinite use cases. It is a writer, programmer, teacher, and translator. It knows more than any human ever could. It can even play text-based RPGs with you or do your homework. It is also remarkable that it is so useful although it has no access to the internet and is not able to perform actions, compared to Siri.

For many ChatGPT is a sudden advancement, but the research is going on for a long time. The development of transformer-based models, such as ChatGPT, started with the paper “Attention is All You Need” published in 2017 by researchers at Google. This paper introduced the transformer architecture, which relies on self-attention mechanisms to process sequential data.

An example architecture for a transformer model. If you want to learn more I recommend https://peterbloem.nl/blog/transformers

This allows transformer models to efficiently handle long-term dependencies and process input sequences of any length, making them well-suited for tasks such as language modeling and machine translation. The success of the transformer architecture in these and other natural language processing tasks has led to its widespread adoption in the field and has helped drive the development of increasingly powerful language models such as ChatGPT. Other transformer-based models like whisper for transcription or GPT-3 the predecessor of ChatGPT were also impressive but were not that much of a topic to the public and were mostly discussed and used in the industry.

I predicted this sudden rise in public interest in my singularity post in July 2022. As AI continues to advance, it is likely to have a significant impact on the public. One potential impact is the potential for AI to automate many tasks that are currently performed by humans, leading to job displacement in some industries. This could have serious economic consequences and may require new approaches to education and job training to help people stay employable in a rapidly changing job market.

Another potential impact of AI is the potential for it to improve our quality of life in various ways. For example, AI-powered personal assistants and smart home technology could make our daily lives more efficient and convenient. AI-powered medical technologies could also help to improve healthcare, making it more accurate and accessible. However, the development and deployment of AI also raises important ethical concerns. As AI becomes more powerful, it will be important to carefully consider how it is used and to ensure that it is deployed responsibly and ethically. For example, AI could be used to discriminate against certain groups of people or to perpetuate biases that already exist in society. This often happens because of already biased training data. It is important for researchers, policymakers, and the public to consider these potential risks and take steps to mitigate them. Overall, the impact of AI on the public is likely to be significant and will require careful consideration and planning to ensure that its benefits are maximized, and its potential drawbacks are minimized.

I expect a chaotic transition phase where many people will suffer because necessary discussions about universal income and AI did not take place early enough. People who use these tools to maximize their productivity will outperform already disadvantaged people with worse access to these tools and the political system is not prepared to solve these problems. In this world that will be more divided than ever, AI is both the savior and destroyer of our society.

AI Art Generation: A Prime Example for Exponential Growth

I wanted to make this post for a while, as I am deeply invested in the development of AI image models, but things happened so fast.

It all started in January 2021 when OpenAi presented DALL-E, an AI model that was able to generate images based on a text prompt. It did not get a lot of attention from the general public at the time because the pictures weren’t that impressive. One year later, in April 2022, they followed up with DALL-E 2, a big step in resolution, quality, and coherence. But since nobody was able to use it themself the public did not talk about it a lot. Just one month later google presented its own model Imagen, which was another step forward and was even able to generate consistent text in images.
It was stunning for people interested in the field, but it was just research. Three months later DALL-E 2 opened its Beta. A lot of news sites started to write articles about it since they were now able to experience it for themself. But before it could become a bigger thing Stability.Ai released the open-source model “stable diffusion” to the general public. Instead of a few thousand people in the DALL-E beta, everybody was able to generate images now. This was just over a month ago. Since then many people took stable diffusion and built GUIs for it, trained their own models for specific use cases, and contributed in every way possible. AI was even used to win an art contest.

The image that won the contest

People all around the globe were stunned by the technology. While many debated the pros and contras and enjoyed making art,
many started to wonder about what would come next. After all, stable diffusion and DALL-E 2 had some weak points.
The resolution was still limited, and faces, hands, and texts were still a problem.
Stability.ai released stable diffusion 1.5 in the same month as an improvement for faces and hands.
Many people thought that we might solve image generation later next year and audio generation would be next.
Maybe we would be able to generate Videos in some form in the next decade. One Week. It took one week until Meta released Make-a-video, on the 29th of September. The videos were just a few seconds long, low resolution, and low quality. But everybody who followed the development of image generation could see that it would follow the same path and that it would become better over the next few months.
2 hours. 2 hours later Phenki was presented, which was able to generate minute-long videos based on longer descriptions of entire scenes.
Just yesterday google presented Imagen video, which could generate higher-resolution videos. Stablilty.ai also announced that they will
release an open-source text2video model, which will most likely have the same impact as stable diffusion did.
The next model has likely already been released when you read this. It is hard to keep up these days.

I want to address some concerns regarding AI image generation since I saw a lot of fear and hate directed at people who develop this technology,
the people who use it, and the technology itself. It is not true that the models just throw together what artists did in the past. While it is true that art was used to train these models, that does not mean that they just copy. The way it works is by looking at multiple images of the same subject to abstract what the subject is about, and to remember the core idea. This is why the model is only 4 Gbyte in size. Many people argue that it copies watermarks and signatures. This is not happening because the AI copies, but because it thinks it is part of the requested subject. If every dog you ever saw in your life had a red collar, you would draw a dog with a red collar. Not because you are copying another dog picture, but because you think it is part of the dog. It is impossible for the AI to remember other pictures. I saw too many people spreading this false information to discredit AI art.

The next argument I see a lot is that AI art is soulless and requires no effort and therefore is worthless. I, myself am not an artist, but I consider myself an art enjoyer. It does not matter to me how much time it took to make something as long as I enjoy it. Saying something is better or worse because of the way it was made sounds strange to me. Many people simply use these models to generate pictures, but there is a group of already talented digital artists who use these models to speed up their creative process. They use them in many creative ways using inpainting and combining them with other digital tools to produce even greater art. Calling all of these artists fakes and dismissing their art as not “real” is something that upsets me.

The last argument is copyright. I will ignore the copyright implications for the output since my last point made that quite clear. The more difficult discussion is about the training input. While I think that companies should be allowed to use every available data to train their models, I can see that some people think differently. Right now it is allowed, but I expect that some countries will adopt some laws to address this technology. For anybody interested in AI art, I recommend lexica.art if you want to see some examples and if you want to generate your own https://beta.dreamstudio.ai/dream is a good starting point. I used them myself to generate my last few images for this blog.

Text2Image/video is a field that developed incredibly fast in the last few months. We will see these developments in more and more areas the more we approach
the singularity. There are some fields that I ignored in this post that go in the same direction that are making similar leaps.
For example Audiogeneration and 2D to 3D. The entire machine learning research is growing exponentially.

Amount of ML-related papers per month

The next big thing will be language models. I missed the chance to talk about Google’s “sentient” AI when it was big in the news,
but I am sure with the release of GPT-4 in the next few months, the topic will become even more present in public discussions.

Singularity: My Predictions

I was going to write about the Metaverse next, but the recent acceleration of technological progress convinced me to write about the singularity immediately before it is too late. The technological singularity is the event or the process when machine intelligence surpasses human intelligence, and the speed of progress becomes so fast that no human can keep up. This might be a slow process, some argue we are already in the singularity, or it might be a sudden event, where people live their normal life and from one day to another, the earth gets transformed into a giant CPU by a swarm of self-replicating nanomachines. I cannot predict what it will be like and nobody can predict what will happen after, but I will try to predict the events on the way. My predictions are obviously subjective and will most likely not be precise, they should act as a wake-up call though, to show how fast it might happen. All my predictions neglect the high probability that humanity will destroy itself or will be destroyed by climate change, Sun storms, viruses, war, or something else. Most people without a deeper understanding of Moores’s law look back on the last 10 or 100 years and think we will just continue to develop. Some people who work in fields like machine learning or biology look at their progress at the moment and base their predictions on that. Very few people can to grasp exponential growth, but I tried to always keep it in mind when I make my predictions based on everything I know and believe and every source I can find.

Human progress curve

Hardware

Fusion reactor (2023-2026): Fusion is one of the core technologies that we need to fight climate change and solve the energy crises. With fusion reactors like Iter and advancements in artificial intelligence we are on a good way to solving fusion. Breakthroughs like this one are the reason why I am so confident that we will see an energy net plus from a fusion reactor in the next few years. I hope commercial use will be possible shortly after. Fusion technology is a perfect example where people thought it would take way longer because they only looked at the engineering side and ignored progress in areas like math and computing.

Quantum computing(now – 2025): quantum computers are already available and will be an essential part of the supercomputing landscape in the coming years. They will not be used in every household, instead, we will use them for cloud computing and solving big problems like machine learning or traffic control. The double exponential growth in quantum computing will blow their ability up in the next 3 years. I think quantum computers are one of the most overlooked technologies because it is so useless right now. But it is one of the fastest developing technologies at the moment and when they are ready they will unlock a lot of things at the same time.

Room temperature superconductors(2025-never): If and only if a room temperature superconductor exists, we will find it in the next 3 years. Material science will have the support of quantum computing and A.I. to find every possible material. This would be the single most important discovery of all time since it not only solves all energy problems but also allows for cheap transport like the hyperloop and many other applications. Examples like multilayered graphene show that there is still from for discovery but we have to wait and see if this dream is achievable.

AR glasses and contact lenses (2023-2025): In the next few years people will spend most of their time looking at or through a display. Both smart glasses and lenses are right around the corner and will change the way we interact with the internet forever. It is the technology that has the most impact on our everyday lives. the biggest obstacle for AR technology will be the bandwidth of our wireless technology. Since the computation of these devices will happen in the cloud or in our “smartphones” we need to send a lot of high-resolution video streams to a lot of people. current wifi and xG technology will not be enough and we have to wait for wifi 7 and 6G to achieve mass adoption.

VR (now-2025): Virtual reality is already part of modern gaming and will be part of the workspace in the coming years. The Hardware will be there in the next 2 years and will be affordable and good enough for all use cases at the end of 2025. I will talk about VR more when I write about the metaverse.

Brain-Computer-Interface (now-2030) BCIs are already in a test stage for medical applications. With companies like Neuralink, we will most likely see BCI in use for non-medical applications within the next 5 years. I do not believe they will be popular if they are not needed for a medical condition since the risk of putting a chip in your head is too high for most people. The only way I can imagine BCIs becoming mainstream in the next 10 years is through advancements in nanorobotics. With small nanorobots in our bloodstream, we can not only supervise our body but we can also use them as reading devices from inside our brain. The risks won’t be as high and the barrier of entry will be lower. I wrote more about that topic in my post about Human-Machine-Merging.

Robotics (now-2026): I think most physical tasks are already manageable by machines, but most of the time humans are still cheaper. With progress in robotics and third-world countries, machines will replace more and more physical jobs. The global economy and our society will have to change drastically. One of the biggest challenges will be to ensure that everyone profits from a world with an abundance of workforce, so we do not end up with an unemployed underclass.

Space Travel (2025-2030): I am not a fan of space travel. At least not now. It wastes money and time and brain power to get us to the moon or mars just so we can say we were there. The truth is that Mars and Moon are extremely unhabitable and survival is impossible for extended periods thanks to radiation, gravity, temperature, resources, and so on. While humanity will most likely spread out someday, if we survive that long, the idea should be to terraform Mars over a century with technology that will not be available for the next 15 years and let machines do it for us. Sending humans to Mars now is too early and just a waste. Sending machines on the other hand can be quite useful. Space is full of resources, and energy that we can harvest. And we also reached a point where looking out for potential threats to humanity can be useful since we achieved a level where we are able to prevent some of them.

Software

The main reason why I couldn’t wait any longer with this post is the progress in A.I. While breakthroughs in machine learning models used to be a yearly event (GPT 1-3 for example) they started to appear monthly beginning with Aplhafold and nowadays they appear weekly with Models like dall2-2, Gato, Imagen, and other impressive results. Even compared to other exponential metrics like humanity’s energy consumption the growth in machine intelligence is sudden. While the first computer is not even 100 years old we already reached the point where the top supercomputer rival the human brain using the positive feedback loop of hardware and software improvements. If the exponential growth continues like this, machines will surpass the entirety of humanity around 2045. Newer studies suggest that quantum computers improve with double exponential speed, which would mean we reach this point even faster.

AI explosion

Let’s take a look at some of the recent achievements. When Dalle-2 came out in January 2021 people started to dream of an A.I. that could produce Videos out of prompts like Dalle did with pictures and they thought it could happen in the next 5 years. Just one year later we have CogVideo which produces short videos. People think we continue as we did in the last few years, but that is not how exponential growth works. Models like Gato, that can perform 600 different tasks are already impressive, but Gato is more like a proof of concept and is relatively small. Deepmind announced that they are in the process of training a bigger version, while other companies are already working on the next step. Not long until they appear daily and when the hardware can keep up, we will likely see the singularity within the next 5-10 years. It is impossible to say what will happen after that. It depends on factors like; Will the models develop consciousness or not? Will they help humanity or kill us? I think we are already at a point where machines outperform a single human in every single task depending on the metric. In the coming year or two, this will become increasingly obvious to the public when models like GPT-4 or Gato 2 get released. Maybe we find the missing idea for consciousness or maybe it will just appear when they become bigger and more capable but, in the end, it does not matter. They will outperform us and help to speed up the progress in every single area to a point where no human can ever follow. This brings me to the final and most important prediction: When will we achieve AGI (Artificial General Intelligence ) and ASI (Artificial SuperIntelligence)? I predict that we will have some form of AGI around 2025. ASI will greatly depend on the limits humans apply to a potential AGI. If we keep it disconnected from the internet and limit its input and output we can delay an ASI for a few more years, but If we give an AGI access to the internet, its own code, and enough hardware, it could be a matter of minutes.

Conclusion

Our governments were left behind when the internet emerged, and they never caught up. In the last five years, we left behind most of the general population, and in the coming five years not even the experts are going to keep up. We are going to experience the most eventful decade in human history, and there are few things we can do. I find the reactions of people who find out about the singularity quite interesting. Some lose all hope and motivation and become scared of the future and others cheer up and are looking forward to the moment the machine takes over. Many ask how they should prepare and it is hard to answer since nobody knows what will happen. I think it is clear that money will be irrelevant after the singularity, but I would never recommend anyone to waste all their money in the 5 years. It is quite the opposite. Having money could be highly important in the years before the singularity for things like Human-Machine-Merging. Other than that there is not much an individual can do besides hoping for a good outcome.

Longevity escape velocity. Die slow, die slower, don’t die at all.

Dying is overrated, so let’s talk about longevity escape velocity. LEV is the moment when the speed with which the life expectancy grows becomes greater than one year per year. This means that your anticipated remaining life span stays the same at every point in time, and you theoretically achieved immortality. I will give a brief introduction to why I believe we will achieve this, when we will achieve this and what are the consequences.
As with nearly everything that has to do with technological singularity we must look at an exponential graph first.

Life Expenctency per year.
Life expectancy per year.

As you can see human life expectancy exploded in the last few years thanks to modern medicine and a safer environment. This graph is generous but you get the idea. Even if you take other data you get the same exponential growth everywhere. Most of it is thanks to medical breakthroughs like antibiotics, vaccines, and other core drugs. These drugs help against many diseases and viruses, but they do not stop aging itself. So why I am so confident that we will get there? I don’t want to take the success away from the biologist, but computer science plays a bigger part than ever before. Most symptoms of aging, like cancer, dying cells, and neurological diseases are related to the smallest building blocks of the human body: proteins. Proteins are complex molecular structures with complicated geometry and they have many different tasks in a cell. Finding out what they do and how they work was nearly impossible for the longest time. Until Deepmind came and presented Alphafold 2, an AI system that predicts a protein’s 3D structure from its amino acid sequence. Alphafold 2 came out a year ago and we are slowly getting the first results from it. Alphafold 2 also helps to solve the mystery of aging itself. Deepmind is planning to use Alphafold 2, to build an entire simulated cell. This could not only reveal the last secrets of aging but also help to find a way to stop it. Deepmind is not the only Company, there are many more. This leads to my next point, why I think we will achieve immortality: Money. We have the technology, and we are getting the knowledge, but we also have the funding. The amount of money that is invested in this kind of research is stunning and not surprising at the same time. Every rich person not knowing what to do with their money uses it to try to live longer. We have Jeff Bezos, a google co-founder, and peter Thiel who founded Altos Lab with approx. 6 billion dollars, then we have the Saudis with a billion, and at the end Mark Zuckerberg with 3 billion in his foundation. These big names are just the tip of the iceberg. I would guess the overall amount is around 60-70 billion dollars, just to cure aging. The money that is spent in all the related areas like cancer or Alzheimer’s treatment is several times higher. And it works. There are already impressive results and tests that treat or stop symptoms of aging or even reverse the process. A famous example from last year was the result of David Sinclair at Harvard University who reversed the age of mice by turning cells back to their original state as stem cells.

2 Mice
These mice are the same age.

These kinds of experiments are exciting, but need rigorous testing and finetuning to be applied to humans. The result, to stop aging will most likely not be a single pill, but several drugs and procedures combined. Another technology that is rapidly advancing, is organ printing. Instead of using the heart or liver of another person, it will be possible to print your own using your own stem cells. This will enable organ transplantation not only as a last resort but as a legitimate way to renew parts of your body. And then there is one last strategy that is used by the desperate to extend their lives if they are already dying: cryonics. Freezing a body shortly before or after their death to preserve the brain and/or body for a longer period until all needed technology is available. I think that this is mostly a scam since there is no known way to freeze a body in a way that preserves it for more than a few weeks. I would probably do it if I had that much money left when I die. You have nothing to lose, and it is probably the best thing that you could try at this point, but the chances that someone will think about you in 100 years and will spend all this time and effort into resurrecting you in case the company that was responsible for your freezing did not fuck up are low. So, I would prefer to stay alive until there are better ways to extend my life.

The core question is who will live long enough to live forever. I think that every person under 60 with a healthy body could be in reach of this goal. There are obviously a lot of conditions. First, we assume that this person has access to all the technology and knowledge that is required and has enough power and money to apply them without being stopped by a government or any other entity that wants to prevent immortality. Unsurprisingly, the people who spend a lot of money, come to mind. I will not discuss the ethical questions of immortality in this post, but I want to shortly address the problem of overpopulation and the imbalance in the world if some people can live forever and some do not. If we look at immortality isolated it looks really scary, but in the context of the technological singularity, it is not that much of a problem. The number of births is dropping dramatically in every country with enough education. More convenient and accessible ways to prevent high birth rates will reduce the number of newborn babies even more. According to the  World Population Prospect report, the global population is growing at its slowest rate since 1950 and we will peak at around 10 billion people in a few decades. And just because people can live forever, does not mean they do. Some people just do not want to, some will just die due to accidents, and some will get killed. Most people will probably not even have access to this technology, because they lack the money or whatever form of power is used in the future. I would guess that the average lifetime of a human will be around 200 years even if immortality would already be possible. The limited access to immortality sounds unfair and it is, I wish I could say that we will solve all conflicts and find a way to give every human access to whatever he wants, but I am not that optimistic. I believe that A.G.I. and other breakthroughs will solve many problems and the overall quality of life will improve, but at our core, we are all still apes throwing shit at each other and fighting for every piece of food even if there is an unlimited supply. I hope I am wrong, but as with all things that grow with exponential speed, our governments are not well prepared. When the first drugs for life extension are hitting the market in a few years, I bet the price will be high and the rules will be unclear and different in every part of the world. If some people live significantly longer than others, wealth and power will move towards longer-living people since they are able to invest time and money into larger projects that pay out after decades. The only hope is to replace humans in leading positions with A.I. systems to prevent them from gathering power over decades.

The last form and ultimate form of immortality is digital immortality. I add this at the end since it is not part of the LEV discussion, but I want to mention it. If you read my post about human-machine merging, you will know that parts of our brain will likely be digital in a few years. At this point, it could become possible to completely transfer your conciseness into the machine and live without a biological body. This idea is inspired by the Ship of Theseus paradox. The alternative would be to make a digital copy of your brain, which brings a lot of practical and ethical problems. The way of slowly merging with the machine circumvents this problem and gives us a way to become immortal in the best possible way. As digital entities, we would have the possibility to live as many lives as we want, in whatever form we want, and in whatever world we want. Most people will likely lose all interest in the real world that we could only experience through an artificial body, while our digital world allows us to experience our surroundings in whatever way we want. It is close to whatever people imagine when they talk about heaven, and I would not be surprised if biological people would try to talk to the digital ancestors like they pray today, but this time they would get an answer. It is hard to predict how exactly this will look like since this is probably already behind the event horizon of the technological singularity and at this point, everything I say is just guessed, but I like to dream about a future where there is no limit to what we can be and what we can experience.

Human-Machine Merging

I want to take a look into the relationship between the human body and technology.
I am a huge fan of the symbiotic relationship between us and machines, but whenever I talk to my mother or someone who is not into technology, they get alienated by the idea of letting electronics into their body.
This is understandable, but I want to believe that there is something great that we can achieve if we leave our fear behind and solve the problems and risks that come with the fusion of humans and machines. There are many more steps to take before we become cyborgs as we know them from movies or games. We are taking them and we are already beyond the point of no return.
Maybe you heard people calling each other out for behaving like zombies when using their smartphones and the smartphone is the most prominent indicator of this kind of change, but there is more.
I will go from apparent things to some that you maybe never thought about.


Let’s start with “wearables”, small devices like watches, trackers, headphones, and glasses.
I think we can agree that they all are part of our path to merge with machines even if some people try to argue that they just use them as a tool and
can live without them. The truth is our brain is so adaptive that once we use a tool often enough and it is always available our brain will just accept it as part of our body. This may sound strange, but experiments and studies show that this happens quickly. One experiment that I want to bring up as an example was done by the University of Pittsburgh, which used a brain-computer interface to connect a monkey to a robotic arm.

monkey with a robot arm
monkey with a robot arm


It didn’t take long for the ape to use the robotic arm like his own.
He performed tasks like eating intuitively with his extra arm and became visibly confused after they removed the brain-computer interface again. Of course, a BCI is a much more extreme example than a smartwatch, but don’t make the mistake to think there is a huge difference between
a brain-computer interface and normal tools, just because the connection is more direct.
If you use your thump only to type messages and for nothing else, then the part of your brain that controls your thump is reprogrammed just like the brain of the ape. I will talk about the brain-computer interface later when we come to the next steps in human-machine merging,
but first, let us go back to the more subtle things that influence us and merge with us: Drugs. Ok this may sound strange but drugs are not as far away from machines then you might think and the road ahead of us is quite clear. Modern medicine and medical tools like CRISPR become more and more complex.
From simple molecules that stop pain or help our stomach, we moved to complex molecular structures that perform actions like cutting DNA and killing viruses. The idea of nanomachines that perform sophisticated actions in our body is not science-fiction but is already in development. 2022 made one thing clear,
people are afraid of medicine they don’t understand. So the question is, will people accept machines in their drugs? I think yes. Not because I have a lot of trust in humanity, but because I have a lot of trust in the greed and envy of people.
If you combine the greed of the pharma industry with the desire of people to stay healthy and some good marketing, you get a big market for the next generation of medical devices and drugs. Now that we have already looked at the future, let me explain what I imagine will happen in the next 10 years.
I will differentiate between people who grew up with a smartphone (2001 and younger) and those who are older ( 2000-1960 ). I am sorry but for everyone older than that the following topics are not that interesting (I will probably write another blog post about Longevity escape velocity, but that’s another topic).
The next step after the smartphone is AR glasses. I am not talking about full VR,(Metaverse is another topic I will write about at some point) but light AR glasses that are indistinguishable from normal glasses. They will be an important step since they will merge our virtual life with the real world.
Many people in the younger group spend more time awake on the internet than in the real world.
This may sound extreme for some people but having 6-8hr on a smartphone and pc combined is not that rare nowadays. Ar-glasses will increase these numbers and every person using one will be online whenever they are awake. This means our brain will interact even more often and directly with the machine.
The technology will be available in the next few years. As usual, the younger group will adapt first and the older generation will take 1-2 years more. For people who want to completely dive into the virtual world, contact lenses will be available just one or 2 years after glasses.
These are not just wild guesses of mine, the technology already exists, even the contact lenses, and just must become cheaper to produce. These non-invasive technologies will be adapted way faster than real brain-computer interfaces like Neuralink, where the barrier to entry is way higher and the risks are even greater.
Though in the end, the ultimate step will be such a device. It is hard to imagine how productive a human with a computer-enhanced mind can be.
I would guess that the overhead of bringing your thoughts and ideas into a useful form by using a keyboard, mouse, and some randomly designed computer interfaces is around 80%. Human productivity will skyrocket even with just a few million brain-computer interfaces.
There are a lot of risks though. While the first brain-computer interfaces will be unidirectional and only take inputs, at some point we will use the other direction too, to increase our input of information and to keep up with our own thoughts.
This step is incredibly dangerous and it is not hard to see how much could go wrong when a potentially dangerous device can manipulate our thoughts. Some might say these devices should be banned but as Fridrich Dürenmatt wrote in his book “the physicist”

Everything, that is thinkable, will be thought.

What Solomon found can also be found by someone else.

Fridrich Dürenmatt

we can’t stop the machine from being built and we also can’t stop anyone from using them, just like we failed with drugs, nuclear weapons, and many other things.

Body-machine merging

While mind-enhancing machines are the most powerful and life-changing devices, enhancing our bodies will also be possible.
If you imagine a cyborg you think about metal limps and build-in weapons, but we already have prostheses that surpass human legs in some aspects.
You may remember the debate about whether Pistorius and other disabled athletes had an advantage at the Olympics. But you probably never heard about someone cutting off their legs to get artificial ones.
I don’t believe it will become a trend to exchange limbs if it is not necessary from a medical standpoint, even if they become better than human limbs in every aspect.
Way more popular and already used are small implants with a variety of abilities.
From small NFC chips in our hands to replace keys to small devices under the skin that measure the blood sugar level and inject insulin automatically.

NFC chip in hand

Another implant that is already used by some enthusiasts is a magnet on the tip of their finger.
It allows feeling electromagnetic waves like wires in the wall or the microwave in the room next door.
It sounds like a gimmick at first glance, but if you ask a group of aliens without ears if they want to attach something to their heads to feel pressure changes in the air, they will probably think the same.
I would argue that the number of senses we have is an entirely random outcome of evolution and increasing this number is the best way to enhance our worldview.
Every time we use technology to leverage our senses we will have a similar experience to a person who sees colors for the first time. We can live without it,
but it is just nicer to have it. Enhancing our senses is the second-best thing that we can do after enhancing our mind with machines. The last and least popular step will be increasing the motoric functions of our body. We could add arms to our body or have a small robot that we control with our minds, but at this point, we are already back at Brain-computer interfaces. I will end the post here to have a readable length. I hope the topic sparked your interest and I promise I will go into greater detail the next time I write about the merging of flesh and silicon.

© 2024 Maximilian Kannen

Theme by Anders NorenUp ↑