Author: Maximilian Kannen (Page 1 of 5)

6 Core Concepts To Understand The Future

02/06/2025 / Maximilian Kannen / 1 Comment

When we dream about the future, our minds often latch onto a single, captivating element. Science fiction narratives and even expert forecasts frequently fall into this trap of isolated focus, leading to predictions that swing wildly between utopian fantasy and dystopian despair. Consider Ray Kurzweil’s relentless optimism regarding technological singularity, often sidelining the messy realities of political upheaval and inherent human biases. Conversely, while insightful about human nature, George Orwell’s visions of totalitarian control couldn’t fully account for the accelerating pace of technological evolution. This inherent bias is why crafting accurate future narratives is so challenging. How many stories truly grappled with the societal implications of widespread social media addiction, a force more relevant than any spaceship? Even within specific domains, the blind spots are evident. While the potential for reversing aging has been a scientific undercurrent for decades, its societal impact remains largely unexplored in mainstream future projections, often overshadowed by the ideas of AI or space colonization. Every vision of tomorrow, therefore, risks becoming a time capsule of present-day obsessions, currently dominated by the narrative of artificial intelligence. With this post I aim to present 6 concepts that are fundamental and need to be considered for every realistic prediction of the future for a time horizon of 1 to 100 years.

1. Climate Collapse: Systemic Instability

The climate crisis is not an isolated environmental issue; it represents a fundamental destabilization of interconnected global systems. Our world is collapsing. Climate change acts as a threat multiplier, exacerbating existing vulnerabilities. Resource scarcity driven by environmental shifts fuels geopolitical friction and mass migration. These pressures, interacting with weakened social cohesion, depress birth rates, leading to demographic imbalances that further strain economic models. This is not speculative forecasting; it’s the observable trajectory of a complex system under duress, demanding a recalibration of our future assumptions.

2. Energy: The Infrastructure of Progress

Energy availability and its accessibility define the boundaries of societal progress. All progress can be measured in Watts. Our energy production is growing faster and faster. The switch to renewable energy requires a second parameter. Our ability to store energy is becoming critical as unreliable energy sources like wind and solar become cheaper and cheaper. The price of every product will go down if energy becomes abundant. The availability of cheap power is a necessity for other areas like AI and robotics.

3. AI: Cognitive Automation and Beyond Human Capability

Artificial intelligence is rapidly evolving from a tool for automation to something capable of independent cognitive function, with implications for the nature of work and human relevance. This will lead to a lot of people losing their jobs. At first, this will only affect simple jobs, but over time, more and more complex jobs will be taken over by AI. There is no reason to believe that there are any jobs that cannot be automated in a long enough time frame. Therefore, any analysis into the future not only has to take into account the idea of a total automation of all services and production, but also what happens when AI further continues to improve and surpass us in all areas. How this plays out is not only dependent on the nature of the technology itself, but mostly on the geopolitical setting in which it will happen.

4. Longevity Escape Velocity: The Erosion of Biological Limits

Emerging research in biogerontology suggests the potential for a paradigm shift in human lifespan, where the rate of life extension outpaces the rate of aging. There are more and more approaches to extend human life. It is quite possible that we will reach a point where we can extend our lives faster than we age. This is called longevity escape velocity. It is not clear when we will reach that point, but it will fundamentally change our society. And current progress, even without additional speedup from AI, suggests that this could happen in the next 10-20 years. Our mortality defines us. Without it, we have to find a new definition for life and what it means to live.

5. Social and Population Decline: The Demographic Reversal

Contrary to prevailing narratives of overpopulation, a significant demographic shift is underway in many parts of the world: a sustained decline in birth rates with far-reaching economic and social consequences. The world population is still growing, but the rate of growth is slowing down. In many countries, the population is already shrinking. This is a problem because our economic system is based on growth. If the population shrinks, there are fewer people who can work and consume. This leads to economic problems. In addition, a shrinking population leads to an aging society. Old people need more care, and there are fewer young people who can provide this care. This demographic reversal requires a fundamental rethinking of societal structures and economic policies designed for an era of continuous growth.

6. Economic System: The Rise of Digital Neo-Feudalism

The current and future economic structure determines the global order. The dominant capitalist model is undergoing a subtle but significant transformation, characterized by the increasing concentration of power within large digital monopolies. The network effects and data control inherent in dominant digital platforms are creating a form of neo-feudalism, where access and opportunity are increasingly mediated by these digital lords. This concentration of economic and informational power has significant implications for competition, innovation, and the distribution of wealth, potentially leading to new forms of societal stratification. This becomes a central problem when technologies like AI and LEV are integrated into this structure.

Conclusion

So, there is still a lot missing from this picture. This post just scratches the surface by laying out these six concepts. We have not properly looked into how they all connect and interact with each other. For example, how more energy changes AI, or what happens when longevity meets a shrinking population under climate stress. And I also did not show how to use this to make really solid predictions for the next decade, let alone the next century.

My main goal here was to get these core ideas down. The plan is to explore those connections and what they actually mean for us in future posts. That’s the crucial next step, because figuring out that interplay is where the real challenge, and the most important insights, are.

Ape looking at ASI after reaching the top

Looking Back On 2024 And Predictions for 2025

28/12/2024 / Maximilian Kannen / 1 Comment

As we are nearing the end of 2024, it’s time to revisit the predictions I laid out at the beginning of this year again. The year was even more interesting than the last one and there is a lot to talk about. Let’s evaluate how my forecasts stood against the reality of 2024. (If you just want to see my predictions jump here)

AI was an even bigger topic this year than last year, and I am happy to say that most of my predictions came true. Image/video/music/3D generative models indeed got a lot better and made a mark this year and I was also right that there will be no popular LLMs with over 1 trillion parameters this year. However, I was wrong about the speed at which LLMs will be integrated into assistants like Alexa or Siri. While there are plans to do so and some functionality is already there, it is not where I imagined it to be at the end of 2024. The main reasons for this delay are cost, scale, and reliability. I am especially proud of this part of my predictions:

State space models like RWKV will also become more relevant for specific use cases and most models will at least support image input if not more modalities. RL Models like Alphafold will push scientific discovery even faster in 2024.

SSMs, multimodality, and RL models all played a big role this year in the areas I expected. We had GPT4-Omni as a truly multimodal model, Jumbo and Mamba as new SSMs, and multiple Alpha models from AlphaGeometry to AlphaProteo over the year. Open-source models also had a great year, with Llama 3, Mistral, and Deepseek.

Open-source models will stay a few months behind closed-source models, and even further in areas like integration, but offer more customizability. Custom AI hardware like the AI Pin will not become widespread, but smartphones will adapt to AI by including more sensors and I/O options

I think it is fair to say that open source will always be a bit behind the top models like Claude 3.5 or GPT-4o. AI gadgets also had a big year, but in my opinion not a single one of them took off. New smartphones are also more centered around AI than ever before. I also had a few misses like my prediction that talking to your PC would be one of the main interfaces at the end of the year. I think I would be closer here if Microsoft shipped the version of Copilot that they showed a while ago, but that was delayed which means I am wrong here.

I made on more precise prediction about the GAIA benchmark.

No system in 2024 will outperform Humans on the new GAIA benchmark, but they are going to double their performance on it. This will mostly be accomplished by improving reasoning, planning, and tool use with improved fine-tuning and new training strategies.

We are on the mark right now with exactly double the scores from last year using the predicted improvements.

The next topic is AI chips and hardware where I was mostly right about demand and Nvidia’s lead. I also predicted the rise of custom chips like Groq, or Samba. My prediction that half of the global compute would be used for AI at the of the year was a bit of an overestimation. I think depending on how broad your definition of AI is, I could argue here but I will just take the loss and move on to the next prediction.

VR Hardware will continue to improve, and we will finally see the first useful everyday AR glasses towards the end of 2024. Quantum computers will become part of some of the cloud providers and will be offered as specialized hardware just like GPUs (Note: This part was written before the AWS Event announcement). They will become more relevant for many industries as the number of Qbits grows. We will also see more variety in chips as they become more specialized to save energy. Brain-computer interfaces will finally be used in humans for actual medical applications.

As you can see in the note, part of the prediction was already true before I released the last blog post and the rest became true after a while. At least if we exclude the prediction of everyday AR glasses. I have a history of overestimating the speed of smart glasses and I am a bit disappointed that we still have to wait for them. We saw the Orion glasses by Meta a few months ago which already are very close to my idea of true AR glasses, but they are sadly not available yet. Brain-computer interfaces were used in actual patients and we saw a lot of new chips specially designed for AI in some form.

I made some general predictions about humanoid robots that were not really specific to 2024 and my opinion on that did not change:

I expect an initial hype around them and adoption in some areas. However, towards the end of the decade they will be replaced with special-purpose robots and humanoid robots will be limited to areas where a human form factor is needed

The overall amount of robots did increase and we sadly saw them most prominently as weapons in Ukraine where both ground robots and small drones were used.

My energy predictions included a commercial fusion reactor which exists now. I also said that nuclear energy and solar would get better and more popular. Nuclear energy had a comeback story this year thanks to the energy demands in AI. Solar is growing faster every year and is surpassing expectations, especially in China.

I made the claim that room-temperature superconductors would be found this year or next if they exist, and while we have no conclusive proof that we found them yet, I still believe that we will, with the help of all the new material science AIs, find them very soon.

Biology and medicine are poised to make significant leaps, powered by AI systems like Alphafold and similar technologies. Cancer and other deadly diseases will become increasingly treatable and aging will become a target for many in the field. The public opinion that aging is natural and cannot/should not be stopped will not change this year but maybe in 2025. Prostheses will become more practical and will be connected directly to nerves and bones. This will make them in some areas better than human parts, but touch and precision will continue to be way worse. We will also see progress in artificial organs grown in animals or completely made in a lab.

These predictions are a bit mixed. Deepmind won the Nobel price for Alphafold and we got not only Alphafold 3 and AlphaProteo, but many more AI models related to biology and medicine. We got cancer vaccines and Prostheses also got better. Growing organs is making progress, but is not as far as I hoped.

Transportation in 2024 will change slightly. EVs will become more popular and cheaper but will not reach the level of adaptation that they have in China. Self-driving cars will stay in big cities as taxi replacements and will not be generally available until 2025. […]

Other infrastructures like the Internet will continue to stay behind the demand for the next few years. The main driver of the increased need for bandwidth will be high-quality video streaming while the main need for speed will arise from interactive systems like cloud-based AI assistants.

EVs did become more popular but are severely limited by import duties which keep prices high outside of China. Internet infrastructure is needed desperately, not just because of AI inference as I predicted, but mostly because of AI training in the US which needs to connect data centers. I did not anticipate the possibility of training across data centers.

Climate change and unstable governments will lead to an increase in refugees worldwide and social unrest will increase. We will see the first effects of AI-induced Job losses. The political debate will become more heated and some important elections like the US election will be fully determined by large-scale AI-based operations that use Fake news, Deepfakes, and online bots to control the public opinion.

My final prediction is sadly the one that is the closest to reality. Wars and economic problems lead to refugees around the world. Economies that are not profiting from AI are struggling and radical parties are gaining support around the world. The US election is filled with misinformation campaigns, both domestic and from outside. We can see the effect of a wave of AI-generated lies that are too much to fight and manipulating voters is as possible as never before. (At the time of this writing the result is not out).

All in all the year was as fast as expected. Some areas were a bit slower and some a bit faster. AI did not scale as fast but made progress in other directions. Outside of technology, this was not a good year. Wars continued and escalated, soft fascism is on the rise globally and we broke the 1.5-degree threshold. This year was a test for international order and we failed. So let’s hope for the next year.

My Predictions

As always we start with AI. Since this will be a bit more comprehensive I will make a list of the things that I think will be most important for AI in 2025

Test time compute (reasoning models): After the announcement of o1, it took less than two months for other labs to develop similar reasoning models. The advantages of these systems are clear: models become better at planning and reasoning, shifting costs from pretraining to inference, introducing new levers to scale, and enabling selective use of compute for complex tasks. Deepseek was among the first to release an alternative, and many others will follow. Reasoning will not stay limited to generating chains of thoughts in advance. More complex forms of test time compute will emerge like Coconut or different forms of search. They will become more and more complex and better. The true potential of these models will shine in 2025 when they become open source and are combined with custom ASICs like Cerebras for fast and cheap inference. While o1 is expensive and takes up to 2 min to think, open source alternatives could run on up to 2000tok/s and cheaper enabling reasoning in seconds. This is also the reason why the demand for custom inference chips is growing. More companies will develop their own chips for this purpose. OpenAI is working on this but will not have their own chips in 2025. The growing demand for reasoning models will drive the demand for inference computing. I really want to emphasize how crucial it is to get inference faster and cheaper. The demand could easily surpass a trillion tokens per hour.

test time vs train time compute graph from OpenAI

The trend of cheaper and cheaper APIs will slow down in 2025 and most improvements in efficiency will go towards the growing demand for tokens. Companies will rise and fall with the ability to serve fast and cheap reasoning.

Pretraining and scaling: As I already said last year, scaling will not happen as fast as all these companies claim. Two years after GPT 4 we will see trillion parameter models again, but not much bigger. Nothing bigger than 5 trillion will appear in 2025 and the demand will be really low because of the pricing and speed. The relevant sizes will be <7B for edge devices, 20-70B for cheap and fast models that can also be self-hosted, and 70B- 700B for the top models. Everything above that is not suited for mass usage. Reasoning models will stay between 7B and 300B. Everything below is not really powerful enough to make full use of test-time compute and everything above is too slow and expensive. Data for pretraining will not change significantly but fine-tuning and post-training will change a lot with synthetic data. The number of training tokens could exceed 20 trillion tokens for a single model next year. Models will learn from other models, reasoning models will get distilled into smaller models, and so on. B100 and B200 will dominate the training hardware and lead to faster training speeds of bigger models, enabling laps to experiment more and iterate faster on existing models. Here is a list of Models to expect: The next GPT generation from OpenAI, Claude 4 family, Llama 4 family, Gemini 2 (Probably released before I finish this blog post)(Update: It was), and reasoning models from all relevant labs. Models will get more general and form foundation models that combine modalities. We saw this trend in 2024 already with models like Fugatto that are not trained for a single task but form a foundation for all audio tasks. Foundation models for video will exist in 2025 and models that combine multiple modalities will appear more often. We will see some small changes in the architectures, like different tokenization, loss functions, or different integration of modalities, and changes to better work with reasoning and test time compute. There is a small chance that we will see some more radical changes.

Agents: Agents were already a popular term in 2024 but were lacking in every aspect. They need reasoning models which did not exist for most of 2024 and they need a lot of scaffolding around the models which takes time. In 2025 we have the necessary capabilities in the models and hardware that is fast enough. A lot of work in 2024 will pay off in 2025 and lead to some impressive systems that will be able to easily perform long-horizon tasks like playing Minecraft, writing a book, doing research, and working as assistants. At the end of 2025, they will appear like something that I would call weak AGI. A system that is not able to do everything a human can but is as general as a human with capabilities that a human would not have and a big overlap. AI APIs like Anthropics Model Context Protocol will become more important as they speed up the interaction between Agents and tools without them relying on GUIs.

AI overall will improve dramatically in 2025. Not a single lab will get a clear lead and Google, OpenAI, Anthropic, and others will shine in slightly different areas without a clear winner. OpenAI will probably be the first to “weak AGI” but at the highest cost, and others will follow in early 2026 for less money. Open Source will stay behind by around half a year in most areas unless certain teams like Meta AI or Qwen are not allowed or do not want to continue open sourcing. Some benchmark predictions: ARC AGI will be beaten(85%) (Update: o3 was announced after I wrote this and already beat it), Frontier math will get close to 50% by the end of 2025, and most other existing Benchmarks will be saturated; Including the GAIA benchmark that I talked about last year. They will develop into three product fields. One is for consumers focusing on tool use, multimodality, and memory, and the other is for commercial use, which will focus on using cheap and fast LLMs for automatization and different workflows. The last one is AI for research. This field will make the most use of reasoning models and will use the most expensive Models and simulations to advance science in 2025. If the answer has scientific value, it is worth spending thousands of dollars on a single question.

RL and science: Outside of the generative AI we have a growing number of specialized AI systems that solve core problems in science. Deepmind is at the forefront of this trend and will continue releasing models like AlphaProteo that revolutionize research. As I mentioned last year, I think if a superconductor exists, we will find it this year. Material science is one of the fields that is taken over by AI and the same goes for parts of physics, math, and biology. 2025 can become a breakthrough year in science. End-to-end vaccine creation could become possible in early 2026. Reasoning models will start to help develop breakthroughs in math and computer science like faster algorithms, new proofs, and new hypotheses.

Hardware: There are multiple trends that will develop. AI servers for hobbyists like Tinybox will become quite popular as open-source models get better and the need for privacy increases. Training will mostly happen on Nvidia AI chips like B100 and the next-gen. Moving the training to other platforms is too complex and expensive which is why Nvidia will stay in the lead here. For inference, the Hardware will become more diverse. Custom ASICs like Cerebras will grow, and most big companies like Meta, OpenAI, Amazon, etc will try to develop their own chips for their models. The biggest demand is, as I already said, fast and cheap inference for reasoning. Google is unique here as they already have TPUs (Trillium) for training and inference which will give them a huge advantage in 2025. I do not think that there will be v7 TPU for most of 2025, but some announcements about it will come out towards the end of the year. This next generation will likely have a special version optimized for inference (even more than v6e). There are Microsoft AI PCs which will stay a gimmick as laptops will have to rely on AI servers anyway for any significant task. Nvidia will try to develop their own inference-optimised chip, or at least increase the memory of their current B200 cards to support faster inference. A software trend that could help here is the transition to lower precision training and inference which gives massive speedups and saves in memory demand.

Robots: 2025 will be the year of humanoid robots. Some will start mass production and they will appear more and more in different industries. They will make huge leaps in autonomy, going from needing manual programming per task to being able to be controlled by voice and demonstration. The software side will improve rapidly using AI generated simulations to train in. World models will serve as training grounds for RL. Some companies will start to market towards consumers but the demand will be very low, because of the price and some general dislike for machines. Instead specialized robots will be very popular. They will change industries like agriculture or public service, like collecting fruits from fields, collecting trash bins in the city, or patients from their rooms. A lot of this will start in 2025. In most cases as single tests but more broadly towards the end of the decade. Robots will quickly be part of every part of live and work.

Self-driving cars are slowly getting better. Waymo will expand its areas and continue to grow. Tesla will not have full self-driving ready in 2025, at least not for the existing cars, but there is a small chance that Elon will use his influence on the government to get permission to sell it as such, leading to a big increase in Tesla-related accidents.

Energy: Solar will continue to surpass expectations and grow the fastest. Fission reactors will return as the need for reliable power for AI increases. Fusion will have its first commercial reactor built but no energy will enter the grid from a fusion reactor in 2025. CO₂ emissions will peak in 2025 again and will start to go down at the end of the decade. The global energy consumption will grow by over 3% in 2025, fueled by AI and electric cars.

Quantum computing: Quantum computer will continue to rapidly grow the number of stable Qbits and start being used for material science. They will however not reach the point where they start breaking encriptions in 2025. More applications for quantum computers will be developed now that they are available. I do not expect Quantum computers to become relevant for machine learning as there are no known ways to gain speedups over current hardware given the current models and training methods.

VR/AR: 2025 will be a big year for AR. Multiple AR glasses will come out and mark the beginning of the next big product category. VR glasses will have a slower year with just incremental improvements such as a new Quest, a new VR device from Valve, and the upcoming Samsung headset. The bigger focus will be on the software side, both for VR and AR. Glasses are the perfect form factor for AI assistants. Android XR will use Gemini starting in 2025, Meta will use Llama 4 in the next generation of their glasses, and Apple will stay behind in terms of AI assistants in glasses.

Longevity: 2024 already showed a lot of progress in the research side of longevity, but also the popularity started growing. Movements like “Don’t Die” gained fans and some news outlets started talking about the possibility. This will speed up in 2025. The idea that it is possible to stop aging using modern medical knowledge will become more mainstream. This will lead to more public discussion about the feasibility and consequences. I can see longevity becoming a key component of the class struggle in the coming decade. The difference in life expectancy between rich and poor will grow rapidly over the next 10 years. It is hard to make short-term predictions about this topic since the result of most human experiments will only appear after a few years. I believe that the first person to reach 200 is already alive and probably already older than 50 and I made a very similar prediction 2 years ago.

Geopolitics: I have a very hard time predicting global politics this year. Trump and other extremists are unpredictable and add to an already uncertain development fueled by the exponential development of Technology. What I can say is that wars will continue and become more and more dominated by technology. We already saw the first drones and robots fighting in Ukraine and this technology will develop rapidly. By the end of this decade, autonomous drones will be the main form of warfare for a developed country. Global influence will shift, with China growing as a global superpower. Silent warfare in the form of cyberattacks and attacks on infrastructure will increase worldwide. The consequences of climate change will devastate the global south and create anger. A bigger escalation is very likely in the next 4 years. Some more precise predictions for 2025: More attacks on undersea cables or satellites, and Russia will gain more parts of Ukraine with the war slowing down. The situation in the middle east around Israel will escalate more. The US will enter at least one conflict zone, with a small chance of sending infantry. There will be at least one high-profile political assassination (a president or similar) and the number of democratic countries will go down.

It is getting harder and harder to make predictions as we speed up as a species.
The interplay between technological advancement and geopolitical instability creates a precarious balance, where the tools of progress can simultaneously become instruments of disruption. I fear that we have reached a point where humanity cannot keep up with their own development and we run into a state of escalation and die a heat death as a society. We need to strengthen our intellectual foundation and create better education systems to prevent a future where we are just apes with access to nukes. We have to give up control to AI at some point when the complexity of the world becomes too much for our brains or become the Übermensch described by Nitschke and develop ourselves to the point that we can keep up.

I did not wrote a lot this year and I hope to change that next year. I finished my degree this year and hope to start many new projects in 2025. You are welcome to write your predictions in the comments.

Why Open Source Models Are Great

01/05/2024 / Maximilian Kannen / 0 Comments

The open-source AI landscape has witnessed significant growth and development in recent years, with numerous projects and initiatives emerging to democratize access to artificial intelligence. In this blog post, I will go into the current state of open-source AI, exploring the key players, fine-tuning techniques, hardware and API providers, and the compelling arguments in favor of open-source AI.

Model Providers

Training LLMs costs a significant amount of money and requires a lot of experience and hardware. Only a few organizations have the means to do so. The following list is not complete and just covers some of the big ones.

Meta is currently the biggest company that open source models. Their model family is called Llama and the current Llama 3 models are available in Two sizes: 8B and 70B. A 405B model is expected soon. The weak points of the current versions are their lack of non-English training data and their small context size. Meta is already working on that.

Mistral is a smaller French company that got investments from Microsoft including computing power. While not all their models are open-source, the ones that are, perform well. they open-sourced a 7B model that was a cornerstone of open source models for quite some time and they open-sourced two Mixture-of-Expert Models (8x7B, and 8x22B) that are still leading non-English open source models, especially at their price point.

Cohere recently open-sourced a few models including their LLMs Command-R and Command-R+. They perform especially well when used in combination with retrivel augmented generation.

Stability Ai is mostly known for open-sourcing text2image models, but they also open-sourced a few smaller LLMs that are decent for their size.

Google does not open source their Gemini models, but they have a set of open models called Gemma which include some experimental LLms that are not based on Transformers.

API-Providers and Hardware

The main argument for open source models is the ability to run them on your own on your personal machine. Current models range from 2B to over 100B parameters. So let’s see what is needed to run them
For small models under 7B, you don’t need anything special. These models could even run on your phone. Models between 7B and 14B models can be run on most PCs but can be very slow unless you have a modern GPU. Bigger models between 14B and 70B require extremely high-end PCs. Apple’s modern high-end devices are especially great since they offer shared memory that is needed for bigger models. Everything over 70B, including the MoE models from Mistral usually are not usable for Home devices. They instead are available on a broad selection of API providers who host different open-source models and compete on price, speed, and latency. I selected a few that excel in one or two of these categories.

Groq is a newer hardware company that developed custom chips for LLMs. That allows them to offer incredible speeds and prices. For example Llama 3 8B for less than 10 cents for a million tokens and over 800 tokens per second. If you run the model yourself you would get around 10-20 tokens per second depending on your hardware.

Together.ai offers nearly all common open-source models and gives you a few million tokens for free at the start to start experimenting immediately.

Perplexity is not only a great search engine, but its API is also great. Not as cheap or fast as Groq, but extremely low latency and they offer their own models with internet access. They also provide free API credits for perplexity pro users.

If you prefer to run them on your own I recommend a newer Nvidia GPU with as much VRAM as you can afford.

Customization

One of the great side effects of having control over the model is the ability to change it to your needs. This starts with simple things like system prompts or temperature. Another thing that is often used is quantization. Quantization describes the process of taking the parameters of the models that are usually saved as floating point numbers with 16 or 32 bits of precision and rounding them in different ways to shrink them to somewhere between 8 and 1 bit. This process reduces the capabilities of the models slightly depending on the factor but makes that model easier and faster to run on weaker hardware.

Fine-tuning

For many use cases, current models are not optimal. They lack knowledge perform worse in a required language or simply do not perform well in a certain task. To solve these problems you can fine-tune the models. Fine-tuning means continuing the training of the model on a small custom data set that helps the model learn the required ability. The following part will be a bit more technical and can be skipped:
3 main types of open-source LLMs are available: Base models, Instruct models, and chat models. Base models are only trained on huge amounts of text and work more like text completion. They do not really work as chatbots and are hard to use. Instruct models are already fine-tuned by the creator on a set of text examples that teach to model to follow the instructions of a given input instead of simply continuing the text. Chat models are further fine-tuned to behave in a chatbot-like way and can hold conversations. They are also often trained to have certain limitations and can refuse to talk about certain things if they are trained to do. For fine-tuning, base models give the most freedom. You could even continue the training with new languages or information and do instruct training after that. There are already instruct datasets available that can be used or you can create your own. If you fine-tune existing instruct models, you usually need fewer data and compute and you can still teach the Model a lot and change its behavior. This is most often the best choice. Existing chat models can still be fine-tuned but since they are already trained in a certain way it is harder to get specific behaviour and teaching it completely new skills is hardly possible. Fine-tuning chat models is best if you just want to change the tone of the model or train it on a specific writing style. There are different ways to fine-tune: Most often you fix the earlier layers of the model so the learned knowledge of the model will not be changed too much and only train the later layers. While this is not totally correct, I like to imagine that later layers are more important for the style of output while earlier layers work more like the core language understanding part of the model. So the more fundamental the thing is you want to change the more layers you need to train. things like a certain writing style usually only require the very end of the model, while things like improved math capabilities need most of the network. There is another way to fine-tune models that often pops up: LORAs. LORA stands for Low-Rank Adaption. It uses the fact that LLM layer matrices have a lower rank ( lower dimension) to split them up into two matrices which contain fewer parameters in sum than the original matrix. The fine-tuning is then happening on the two new matrices which make the process faster and cheaper and allow LORAs to be shared with less memory overhead. The LORA matrices can then later be swapped in and out like a hat.

output control

If you have control over your model, you can also inject things into its output. The most popular example is something like JSON mode, where at every token instead of selecting randomly from the logits, an external program checks which output token is valid given the JSON grammar and can select the one. This can be used to guarantee that the output follows a certain given structure and can also be used for things like tool use or other additional functions.

Local tools

There is a range of tools to run models locally from chat interfaces that mimic the experience of chatGPT to local API servers that can be used for companies or developers. Here are some examples

GPT4All is a local chat interface that not only allows you to download models but can also give the models access to your local documents and is very easy to use.

Ollama is a local LLM server which makes it easy to install additional models and supports a wide range of Operating systems and Hardware.

LM Studio also offers a user interface to chat with models but also includes functionality to fine-tune them with LORA

Conclusion

So as you can see there are many reasons why open-source models can be superior even though the biggest and smartest models that are currently available are slightly better than the best open-source models. They are way cheaper, even if you compare price per performance and they allow for much more custom control. They can be trained to your liking and needs, and offer privacy and control over your data and use. If you run them locally they often have lower latency and even if you use API providers you will get better prices and super-fast interference. Open-source models used to be around a year behind some of the top models, but in recent times, they started to catch up. They will probably never lead the field in terms of capabilities but they will always be the cheaper option. ChatGPT3.5 is the best example of a model that got beaten by open source a long time ago. Models like Llama 3 are not only cheaper, but they are also way faster and offer all the advantages of open models.

The False Promise of Full Dive VR

18/04/2024 / Maximilian Kannen / 0 Comments

While some science fiction technologies are already here or very close (C3PO, Longevity, AGI, etc) some are very far away or not possible at all. I will try to explain why FDVR is one of these and why it is so hard. Let’s start with the definition of FDVR as portrayed in popular science fiction works like “Matrix” or “3 Body Problem” is the idea of directly sending and receiving signals from the brain to simulate a realistic virtual reality for the brain.

Let’s look at the simplest problem first. If we want to get signals from the brain and use them in virtual reality we have to intercept them and cut off the connection to the real body. There is currently no known way to do this without damaging the body permanently and even if we could, a body without signals is not really functioning. laying in your own piss would be your smallest problem. You wouldn’t even be able to breathe. But even if we manage to do this we would only have solved the easiest part. Now that we have the output, we also need input.

Simulating feelings from every nerve end of your body requires the system to not only simulate the entire virtual reality which would have to be accurate on a molecular level (smell, taste) but also would need to be rendered quite far and needs full simulation of realistic physics (eyes, touch). And not only would the world need to be simulated to such a degree, but your body would also need to be a 1 to 1 simulation. every nerve end and visual receptor needs to be simulated to achieve a realistic experience that would fool your brain. This leads to the next problem: To get all these signals back into the brain we would need to understand the encoding and position of every input signal to the brain. This requires such an in-depth understanding of the brain that FDVR would be the most boring thing we could do with it at this point.

If we had the computing power to simulate all this and the deep understanding of the brain required to do it, It would be easier to simply make a digital copy of your brain and run it as a simulation. And if there is enough computing power to simulate a world to this degree, then we are most likely already living in one. The computing power to do this even for a few people could exceed what is possible with the energy in our solar system, following the law of computational irreducibility.

So in conclusion, if a future ASI would have access to such a deep understanding of biology and so much energy and computing power, using it for FDVR would be a total waste. But who knows maybe we get a fast takeoff soon and ASI will find some ways and we archive FSVR with some compromises in a few decades. I personally would bet that FDVR will never play a huge part in our society.

Gemini is here

06/12/2023 / Maximilian Kannen / 0 Comments

Google Deepmind just released their new Gemini models. They come in 3 sizes. Nano will be used on devices like the Pixel phones, and Pro will be used in their products such as Bard, and Ultra is going to be released at the beginning of next year. The models are multimodal and can input, audio, video, text, images, and code.

It outperforms current state-of-the-art models not only in text-based tasks but also in other modalities.

Test the Pro version now in Bard and read more about the model here and here.

Looking Back On 2023 And Predictions for 2024

02/12/2023 / Maximilian Kannen / 4 Comments

As we close the chapter on 2023, it’s time to revisit the predictions I laid out at the beginning of the year. It was a year marked by technological strides and societal challenges. Let’s evaluate how my forecasts stood against the unfolding of 2023.

Let’s start with my predictions about AI:

“AI will continue to disrupt various industries such as search and creative writing and spark public debate about its impact, even more than is happening right now. It will also lead to the production of high-quality media with fewer people and resources thanks to AI’s assistance. In the field of 3D generation, I expect to see similar progress in 2023, bringing us closer to the quality of 2D generation.“

I think I was mostly right. GPT-4 definitely sparked a public debate and we see many industries that became more productive thanks to AI. 3D generation is also already at the level that image generation had at the beginning of the year. What I did not predict was the speed at which companies like Meta or Microsoft would iterate and deploy LLMs in many forms.

My next prediction was about Fusion: “While I expect to see continued progress in this field, it is unlikely that we will see a commercial fusion reactor within the next two years.“

Again I was on point but I missed talking about other energy sources like solar which are more relevant. I would count that as a bad focus and not a failed prediction.

I also made predictions for Hardware: “[…] we can expect to see quantum computers with over 1000 Qbits in the upcoming year. GPUs will become more important with the rise of AI. However, these advancements in hardware technology also come with the need for careful consideration and planning in terms of production and distribution. “

We indeed achieved 1000 Qbits even though IBM was not the first company to do so. I also correctly predicted the increased demand for GPUs, but I have to admit I did not expect that scale. I also was more pessimistic about the ability of TSMC and others to meet the demand, and while they drastically outperformed my expectations I was still kind of right because the demand is also way bigger than I anticipated.

My Predictions for VR: “But the year 2023 is shaping up to be a promising one for the VR hardware market, with multiple new headsets, such as the Quest 3, and maybe even an Apple Headset, set to be released. These new products will likely offer improved graphics, more intuitive controls, and a wider range of content and experiences. While it may not fully realize the vision of a “Metaverse”, VR is still likely to be a great entertainment product for many people”

And AR: “2023 will be a critical year for AR. It will be the first time that we can build affordable Hardware in a small form factor. Chips like the Snapdragon AR2 Gen 1 implement Wifi 7 and low energy usage and will make it possible to build Smart glasses.“

While my VR predictions were all correct, my AR predictions underestimated the difficulty of producing smart glasses in a normal form factor.

I did not make concrete predictions about Brain-computer interfaces, but I honestly expected more progress. More about that in my new predictions later.

Now on to biology and medicine. I made a multiple-year prediction: “If this continues we will be able to beat cancer in the next few years, which leads to the next field.” this cannot be verified yet, but I still believe in it and predicted that a person under 60 could live forever. Recently I looked a lot more into aging research and I still believe that this is correct even though I would change from “every person under 60 has the potential“, to “there is a person under 60 that will“. I think this is an important distinction because stopping aging requires a lot of money and dedication and will not be available for most in the near future.

I ended the post with: “While this was a slow year in some aspects, major progress was made in most fields, and 2023 will be even faster. We are at the knee of an exponential blowup and we are not ready for what is coming. While I am still worried about how society will react and adapt, I am excited for 2023 and the rest of the decade.“

Again I believe that I was very much on point with this. Many people were blown away by the rapid developments this year. So let’s talk about the stuff that I did not predict or ignored last year. LK99 is a material that was supposed to be a room-temperature superconductor. At the current time, this was most likely false, but I realized that I did not make a prediction about superconductors in the blog post. I will do this later in this one.

On to the new predictions for 2024. Let’s start with AI again. LLM-based systems will become more autonomous and will reach a point where many will consider them AGI. I personally do not think that we will reach AGI this year, but most likely in 2025. There is also a 70% chance that we will find a new architecture that generalizes better than transformers. No system in 2024 will outperform Humans on the new GAIA benchmark, but they are going to double their performance on it. This will mostly be accomplished by improving reasoning, planning, and tool use with improved fine-tuning and new training strategies.

Results of current Systems on the GAIA benchmark compared to humans

I also predict that commercially viable models will stay under 1 trillion parameters in 2024. There will be a few models over this threshold, but they will not be used in consumer products without paying for them similar to GPT-4 (non-turbo). State space models like RWKV will also become more relevant for specific use cases and most models will at least support image input if not more modalities. RL Models like Alphafold will push scientific discovery even faster in 2024.

Image/video/music/3D generative models will improve dramatically and completely change the art industries. The focus is going to be more on integration and ways to use them and less on pure text2output capabilities. Assistants like Alexa will integrate LMMs and improve drastically. OpenAI will release at least one model that will not be called GPT-5 and wait with GPT-5 until later in the year.

Apple will announce its first LMM at WWDC and at the end of the year we will be able to do most stuff by just talking to our PC. Meta will release Llama-3 which is going to be multimodal and close to GPT-4, and Google will release Gemini at the beginning of the year, which will be comparable to GPT-4 at the beginning and will improve down the year.

Open-source models will stay a few months behind closed-source models, and even further in areas like integration, but offer more customizability. Custom AI hardware like the AI Pin will not become widespread, but smartphones will adapt to AI by including more sensors and I/O options, and towards 2025 we will see smart glasses with AI integration. The sectors that will be influenced the most by AI are education and healthcare, but in the short term, the first industries will be artists and some office workers.

Let’s continue with Hardware. Nvidia will stay the leader in AI hardware with H200 and later this year with B100. Many companies will use their custom chips like Microsoft, Apple, and Google, but the demand will lead to increased sales for every chip company. At the end of 2024, more than half of the global flops will be used for AI. VR Hardware will continue to improve, and we will finally see the first useful everyday AR glasses towards the end of 2024. Quantum computers will become part of some of the cloud providers and will be offered as specialized hardware just like GPUs (Note: This part was written before the AWS Event announcement). They will become more relevant for many industries as the number of Qbits grows. We will also see more variety in chips as they become more specialized to save energy. Brain-computer interfaces will finally be used in humans for actual medical applications.

I did not make any predictions about robots last year, because there weren’t many exciting developments, but that changed. Multiple companies started developing humanoid robots that will be ready in 2024 or 2025. I expect an initial hype around them and adoption in some areas. However, towards the end of the decade they will be replaced with special-purpose robots and humanoid robots will be limited to areas where a human form factor is needed. In general, the amount of Robots will increase in all areas. Progress in planning and advanced AI allows for robots to act in unknown environments and do new tasks. They will leave controlled environments like factories and will appear in, shops, restaurants, streets, and many other places.

The robots: Atlas by Boston Dynamics, Digit by Agility Robotics, and Tesla Optimus by Tesla

Let’s continue with energy. The transition to renewable energy will accelerate in 2024, with a significant focus on solar. The first commercial fusion reactor will begin construction, and nuclear reactors will become even safer, mostly solving the waste problem. More people will build solar for their own houses and become most self-sufficient.

I mentioned LK99 earlier already so here are my predictions for material science. I think that if a room-temperature superconductor is possible, an AI-based system will find it in the next two years. In fact, most new materials will be hypothesized and analyzed by AI and will bring a lot of progress for areas like batteries, solar panels, and other material-dependent fields (Note: this part was written four days before Deepmind presented GNoME).

Biology and medicine are poised to make significant leaps, powered by AI systems like Alphafold and similar technologies. Cancer and other deadly diseases will become increasingly treatable and aging will become a target for many in the field. The public opinion that aging is natural and cannot/should not be stopped will not change this year but maybe in 2025. Prostheses will become more practical and will be connected directly to nerves and bones. This will make them in some areas better than human parts, but touch and precision will continue to be way worse. We will also see progress in artificial organs grown in animals or completely made in a lab.

Transportation in 2024 will change slightly. EVs will become more popular and cheaper but will not reach the level of adaptation that they have in China. Self-driving cars will stay in big cities as taxi replacements and will not be generally available until 2025. Hypertubes will not become a train replacement and will only be built for very specific connections if they get built at all in the next few years.

Other infrastructures like the Internet will continue to stay behind the demand for the next few years. The main driver of the increased need for bandwidth will be high-quality video streaming while the main need for speed will arise from interactive systems like cloud-based AI assistants.

Climate change and unstable governments will lead to an increase in refugees worldwide and social unrest will increase. We will see the first effects of AI-induced Job losses. The political debate will become more heated and some important elections like the US election will be fully determined by large-scale AI-based operations that use Fake news, Deepfakes, and online bots to control the public opinion.

I made a lot more verifiable predictions this time and I hope to see how much I got correct. If I missed any area or technology write them in the comments and I will add a prediction in the comments. Also, let me know your predictions.

Humane presents the AI Pin

09/11/2023 / Maximilian Kannen / 0 Comments

The company presented the AI Pin Today. It is a small device with a camera, microphone, sensors, and laser projector. It is designed to replace the smartphone and costs 699 plus a monthly subscription of 24 dollars. This includes the unlimited use of multiple frontier LLMs, internet, and multiple other services like music. It can see what you see, translate, manage your calendar, send messages, and answer your questions.

I personally think that the biggest problem is the addiction of most people to social media and YouTube which makes it, not a replacement and it is too expensive to add to a phone. There is also a factor that phones can do many of the things already and are not much more expensive. I can imagine something similar in the future in combination with AR glasses. More information: https://hu.ma.ne/

Google found a way to improve math skills in LLMs

31/08/2023 / Maximilian Kannen / 0 Comments

LLMs are powerful tools, but they often struggle with tasks that require logical and algorithmic reasoning, such as arithmetic. A team of researchers from Google has developed a new technique to teach LLMs how to perform arithmetic operations by using in-context learning and algorithmic prompting. Algorithmic prompting means that the model is given detailed explanations of each step of the algorithm, such as addition or multiplication. The researchers showed that this technique can improve the performance of LLMs on arithmetic problems that are much harder than those seen in the examples. They also demonstrated that LLMs can use algorithmic reasoning to solve complex word problems by interacting with other models that have different skills. This work suggests that LLMs can learn algorithmic reasoning as a skill and apply it to various tasks.

Results from the paper comparing their approach vs. other prompting techniques.

Llama 2: New State-of-the-Art Open Source LLM

25/07/2023 / Maximilian Kannen / 0 Comments

Meta recently released their new Llama models. The new models come in sizes from 7 to 70 billion parameters and are released as base models and chat models, which are fine-tuned with two separate reward models for safety and helpfulness. While the models are only a small improvement over the old Llama models, the most important change is the license which now allows commercial use.

Microsoft published the next Version of Kosmos

28/06/2023 / Maximilian Kannen / 0 Comments

Researchers at Microsoft have unveiled Kosmos-2 the successor of Kosmos-1, a Multimodal Large Language Model (MLLM) that integrates the capability of perceiving object descriptions and grounding text in the visual world. By representing refer expressions as links in Markdown format, Kosmos-2 achieves the vital task of grounding text to visual elements, enabling multimodal grounding, referring expression comprehension and generation, perception-language tasks, and language understanding and generation. This milestone in the development of artificial general intelligence lays the foundation for Embodiment AI and the convergence of language, multimodal perception, action, and world modeling, bringing us closer to bridging the gap between humans and machines and revolutionizing various domains where AI interacts with the real world. With just 1.6B parameters, the model is quite small and will be available open on GitHub

RoboCat handles every Robot

21/06/2023 / Maximilian Kannen / 0 Comments

Deepmind published a new blog post where they present their newest AI which is based on their previous work Gato. RoboCat is a self-improving AI agent for robotics that learns to perform a variety of tasks across different arms and then self-generates new training data to improve its technique. It is the first agent to solve and adapt to multiple tasks and do so across different, real robots. RoboCat learns much faster than other state-of-the-art models. It can pick up a new task with as few as 100 demonstrations because it draws from a large and diverse dataset. This capability will help accelerate robotics research, as it reduces the need for human-supervised training, and is an important step towards creating a general-purpose robot.

Voicebox: A new Voice Model

17/06/2023 / Maximilian Kannen / 0 Comments

Voicebox is a new generative AI for speech that can generalize to speech-generation tasks it was not specifically trained to accomplish with state-of-the-art performance. It can create outputs in a vast variety of styles, from scratch or from a sample, and it can modify any part of a given sample. It can also perform tasks such as:

In-context text-to-speech synthesis: Using a short audio segment, it can match its style and generate text.
Cross-lingual style transfer: Given a sample of speech and a passage of text in six languages, it can produce a reading of the text in that language.
Speech denoising and editing: It can resynthesize or replace corrupted segments within audio recordings.
Diverse speech sampling: It can generate speech that is more representative of how people talk in the real world.

Voicebox uses a new approach called Flow Matching, which learns from raw audio and transcription without requiring specific training for each task. It also uses a highly effective classifier to distinguish between authentic speech and audio generated with Voicebox. Voicebox outperforms the current state-of-the-art English model VALL-E on zero-shot text-to-speech and cross-lingual style transfer and achieves new state-of-the-art results on word error rate and audio similarity. Voicebox is not publicly available because of the potential risks of misuse, but the researchers have shared audio samples and a research paper detailing the approach and results. They hope to see a similar impact for speech as for other generative AI domains in the future.

Synthetic Human Embryos

16/06/2023 / Maximilian Kannen / 0 Comments

After earlier experiments on mice, it is now possible to create human embryos out of stem cells. This allows us to make human life without sperm or eggs. Since the experiments are limited by ethical concerns they stopped the growth of the embryo at an early stage. This research could lead to a better understanding of early development and could allow us someday to design our successor species.

Meta Released a Music Model

13/06/2023 / Maximilian Kannen / 0 Comments

This week Meta open-sourced a music generation model similar to Google’s MusicLM. The Model is named MusicGen and is completely open-source. These models can generate all kinds of music based on given text prompts similar to image models.

New OpenAI Update

13/06/2023 / Maximilian Kannen / 0 Comments

OpenAI announced a set of changes to their model APIs. The biggest announcement is the addition of function calls for both GPT-3.5 and 4. This allows developers to enable plugins and other external tools for the models.

They also released new versions of GPT-3.5 and 4 that are better at following directions and a Version of 3.5 with 16K context window.

In addition, they made the embedding model 75% cheaper, which is used to create vector databases and allows models to dynamically load relevant data, like memory. GPT-3.5 also became cheaper now costing only $0.0015 per 1K input tokens.

DeepMind Makes Everything Faster

07/06/2023 / Maximilian Kannen / 0 Comments

After DeepMind developed AlphaTensor last year and found a new algorithm for matrix multiplication, they did it again. This time they developed AlphaDev which found a new algorithm for sorting. This sounds not as exciting as a new language model, but sorting algorithms run billions of times every hour. Optimizing central algorithms like sorting and searching is one of the oldest parts of computer science and they are getting optimized for over a hundred years at this point. We did not find a better solution in the last 10 years and some believed that we reached the limit of what is possible. AlphaDevs’ new solution was implemented in the standard C++ library and is used already. The impact of these small improvements becomes enormous because they are used so much and the amount of energy that is saved adds up quickly. They also found a new hash algorithm which is used a similar amount. If AlphaDev continues to find improvements for core algorithms, every software in the world will run faster and more efficiently. Breakthroughs like this have to be considered in the discussion around the climate impact of AI training. The energy saved by these improvements offsets the used energy for training by orders of magnitude.

Apples VR Headset is Here

06/06/2023 / Maximilian Kannen / 0 Comments

Apple finally announced their upcoming VR headset which will focus on productivity and Cinematic entertainment. The 4K displays are powered by their M2 chip. This requires an external energy source and makes the headset with 3500$ very expensive. The Headset focuses on Mixed reality experiences similar to the new Meta Quest 3, but unlike the Quest, it will not be released until next year. It is probably a good starting point for Apple to build their new Product platform, but if you are not in desperate need of a high-resolution headset it is perhaps not a good choice for you.

Meta Quest 3

01/06/2023 / Maximilian Kannen / 1 Comment

Meta announced their new Meta Quest 3 headset. It is the successor to the Quest 2, the most popular VR headset of all time. The price went up a bit, the processing power and form factor improved as did the visuals. especially passthrough is better with color passthrough. Eye tracking is not included. Together with the upcoming Apple entrance into the VR space, this will give the XR World a new push forward.

Copilots for everyone

24/05/2023 / Maximilian Kannen / 0 Comments

Microsoft Build is currently underway, with Microsoft showcasing a range of new and upcoming products, including various Copilots such as Copilot for Bing, GitHub, and Edge. In their pipeline, they also have plans to launch a Copilot specifically designed for Windows.

These Copilots are all built using Microsoft’s new Azure AI Studio Platform, which is now open to developers, allowing them to create their own Copilots.

Furthermore, Microsoft announced their support for an open plugin system, similar to the one utilized by ChatGPT, making plugins accessible to all Copilots. If this solution becomes the industry standard for AI systems, it has the potential to establish Microsoft as a dominant player in the AI market. The first day of Microsoft Build concluded with an exceptional presentation by Andrej Karpathy, delving into the history and inner workings of GPT models. If you’re interested in gaining insights into how these models operate and learn, I highly recommend watching his talk titled “State of GPT.”

Intel Presents New Hardware

23/05/2023 / Maximilian Kannen / 0 Comments

Intel just announced a new supercomputer named Aurora. It is expected to offer more than 2 exaflops of peak double-precision compute performance and is based on their new GPU series which outperforms even the new H100 cards from NVIDIA.

They are going to use Aurora to train their own LLMs up to a trillion parameters. This would likely be the first 1T model.

I am excited to see even bigger models and more diverse hardware and software options in the field.