The Future is Now

Tag: A.I. (Page 4 of 4)

Large Language Models: An Overview

Large Language Models (LLMs) are machine learning-based tools that are able to predict the next word in a given sequence of words. In this post, I want to clarify what they can and cannot do, how they work, what their limitations will be in the future, and how they came to be.

History

With the recent surge in public awareness surrounding Large Language Models (LLMs), a discourse has arisen concerning the potential benefits and risks associated with this technology. Yet, for those well-versed in the field of machine learning, this development represents the next step in a long-standing evolutionary process that began over half a century ago. The first language models were developed over 50 years ago and used statistical approaches that were barely able to form correct sentences.

With the rise of deep learning architectures like recurrent neural networks (RNN) and Long-Short-Term Memory (LSTM) neural networks, they became more powerful but also started to grow in size and needed data. 

The emergence of GPUs, and later on specialized processing chips called TPUs, facilitated the construction of larger models, with companies such as IBM and Google spearheading the creation of translation and other language-related applications. 

The biggest breakthrough was in 2017 when the paper “Attention is all you need” by Google introduced the Transformer. The Transformer model used self-attention to find connections between words independent of their position in the input and was, therefore, able to learn more complex dependencies. It was also more efficient to train which meant it could train on larger data sets. OpenAI used the Transformer to build GPT-2, the most powerful language model at its time, which developed some surprising capabilities which led to the idea that scaling these models up would unlock even more impressive capabilities. Consequently, many research teams applied the Transformer to diverse problems, training numerous models of increasing size, such as BERT, XLNet, ERNIE, and Codex, with GPT-3 being the most notable. However, most of these models were proprietary and unavailable to the public. This has changed with recent releases like Dall-E for image generation and GitHub Copilot. Around this time it became clear that scaling language models up became less effective and too expensive for most companies. This was confirmed by Deepmind in 2022 in their research paper “Training Compute-Optimal Large Language Models” which showed that most LLMs are vastly undertrained and too big for their training data set.

OpenAI and others started to use other means to improve their models, such as reinforced learning. That led to InstructGPT which was fine-tuned to perform the described tasks. They used the same technique to fine-tune their model on dialog data which led to the famous ChatGPT. 

How they work

The core of most modern machine learning architectures are neural networks. As the name suggests, they are inspired by their biological counterpart. 

Simple neural network with 2 input nodes, 5 hidden nodes, and 1 output node
Simple neural network with 2 input nodes, 5 hidden nodes, and 1 output node

At a high level, a neural network consists of three main components: an input layer, one or more hidden layers, and an output layer. The input layer receives data, which is then processed through the hidden layers. Finally, the output layer produces a prediction or classification based on the input data.

The basic building block of a neural network is a neuron, which takes inputs, applies a mathematical function (activation function) to them, and produces an output. The output of each neuron ni is multiplied by the weight wij and added together into the neuron nj in the next layer until the output layer is reached. This process can be implemented as a simple matrix-vector multiplication with the input as the vector I and the weights as the matrix W: WxI = O, where O is the output vector which is used as the input for the next layer where we apply the activation function f(O) = I until the final output.

During training, the network is presented with a set of labeled examples, known as the training set. The network uses these examples to learn patterns in the data and adjust its internal weights to improve its predictions. The process of adjusting the weights is known as backpropagation.

Backpropagation works by calculating the error between the network’s output and the correct output for each example in the training set. The error is then propagated backwards through the network, adjusting the weights of each neuron in the opposite direction of the error gradient. This process is repeated for many iterations until the network’s predictions are accurate enough.

Since 2017 most LLMs are based on Transformers. Which also contain simple feed-forward networks, but at their core have a self-attention mechanism that allows the Transformer to detect dependencies between different words in the input.

Classic Transformer block

The self-attention mechanism in the Transformer model works by using three vectors for each element of the input sequence: the query vector, the key vector, and the value vector. These vectors are used to compute an attention score for every element in the sequence. We get the score of the jth element by calculating the dot product of the query vector with every key vector ki of every element and multiplying the result with the value vector vi. We then sum up all the results to get the output.

Based on a graphic by peter bloem

Before you multiply the attention score with the value vectors, you would first apply a softmax function to the attention scores. This will ensure that they add up to one and that the resulting weighted value vectors are weighted proportionally to their relevance to the query element. This weighted sum is then used as input to the next layer of the Transformer model. I skipped or simplified other parts of the algorithm as well to make it easier to understand. For a more in-depth explanation of Transformers, I recommend this blog or the creator of GPT himself.

The self-attention mechanism in the Transformer model allows the model to capture long-range dependencies and relationships between distant elements in the input sequence. By selectively attending to different parts of the sequence at each processing step, the model is able to focus on the most relevant information for the task at hand. This makes the Transformer architecture highly effective for natural language processing tasks, where capturing long-range dependencies is crucial for generating coherent and meaningful output. 

What they can and cannot do

As explained earlier LLMs are text prediction systems. They are not able to “think”, “feel”, or “experience” anything, but are able to learn complex ideas to be able to predict text accurately. For example the sequence “2 + 2 =” can only be continued if there is an internal representation of basic math inside the Transformer. This is also the reason why LLMs often produce plausible-looking output that makes sense but is wrong. Since the model is multiple magnitudes smaller than the training data and even smaller compared to all possible inputs it is not possible to represent all the needed data. This means that LLMs are great for producing high-quality text about a simple topic, but they are not great at understanding complex problems that require a huge amount of available information and reasoning like mathematical proofs. This can be improved by providing needed information in the input sequence which will increase the probability of correct outputs. A great example would be BingGPT which uses search queries to get additional information about the input. You can also train LLMs to do this themselves by fine-tuning them on API calls.

What will they be able to do and what are the limits

The chinchilla scaling law shows that LLMs are able to adapt to even larger amounts of training data. If we can collect the needed amount of high-quality text data and processing power LLMs will be able to learn even more complex language-related tasks and will become more capable and reliable. They will never be flawless on their own and have the core problem that you are never able to understand how the output was produced as neural networks are black boxes for an observer. They will however become more general as they learn to use pictures, audio, and other sensory data as input, at which point they are barely still language models. The Transformer architecture however will always be a token prediction tool and will never develop “consciousness” or any kind of internal thought as they are still just several Matrix calculations on a fixed input. I suspect that we need at least some internal activity, and the ability to learn during deployment for AGI. But even without that, they will become part of most professions, hidden inside other applications like Discord, Slack, or Powerpoint.

Bias and other problems

LLMs are trained on large text corpora which are filled with certain views, opinions, and mistakes. The resulting output is therefore flawed. The current solution includes blocking certain words from input and/or output. Fine-tune with human feedback, or provide detailed instructions and restrictions in every prompt. They are all not flawless as blocking words is not precise enough. Added instructions can be circumvented by simply overwriting them with prompt injections. Fine-Tuning with human feedback is the best solution that comes with its own problem which is that the people who rate the outputs include their own bias in the fine-tuned model. This becomes a huge problem if you start using these models in education, communication, and other use cases. The views of the group of people who are controlling the training process are now projected onto everybody in the most subtle and efficient way imaginable. As OpenAI stated in their recent post the obvious solution will be to fine-tune your own model, which will lead to less outside influence but also increases the risk of shutting out other views and could create digital echo chambers where people put their radical beliefs into models and are getting positive feedback.

Another problem is that most people are not aware of how these systems work and terms like “artificial intelligence” suggest some form of being inside the machine. They start to anthropomorphize them and accept the AI unconsciously as another person. This is because our brains are trained to look at language as something only an intelligent being can produce. This starts by adding things like “thanks” to your prompt and then moves quickly to romantic feelings or some other kind of emotional connection. This will become increasingly problematic the better and more fine-tuned the models become. Adding text-to-speech and natural language understanding will also amplify this feeling.

Scaling

I see many people asking for an open-source version of chatGPT and wishing to have such a system on their computers. Compared to generative models like stable diffusion, LLMs are way bigger and more expensive to run. This means that they are not viable for consumer hardware. It takes millions of dollars in computing power to train and is only able to run on large servers. However, there are signs that this could change in the future. The Chinchilla scaling law implies that we can move a larger part of the computation into the training process by using smaller models with more data. An early example would be the new LLaMA models by Meta which are able to run on consumer hardware and are comparable to the original GPT-3. This still requires millions in training, but this can be crowdfunded or distributed. While these language models will never be able to compete with the state-of-the-art models made by large companies, they will become viable in the next 1-2 years and will lead to personalized fine-tuned models that take on the role of an assistant. Two excellent examples of open-source projects that try to build such models are “Open-Assistant” and “RWKV“.

taken from the paper “Compute Trends Across Three Eras of Machine Learning

The current growth in computing will not be sustainable much longer as it is not only driven by Moore’s law, but also by an increase in investments in training which will soon hit a point where the return does not justify the costs. at this point, we will have to wait for the Hardware to catch up again.

What are the main use cases?

When ChatGPT came out, many used it like Google to get answers to their questions. This is actually one of the weak points of LLMs since they can only know what was inside their training data. They tend to get facts wrong and produce believable misinformation. This can be fixed by including search results like Bing is doing.

The better use case is creative writing and other text-based tasks like summarising, explaining, or translating. The biggest change will therefore happen in jobs like customer support, journalism, and teaching. The education system in particular can benefit greatly from this. In many countries, Germany for example, teachers are in need. Classes are getting bigger and lessons are less effective. Tools like ChatGPT are already helping many students and when more specialized programs use LLMs to provide a better experience they will outperform traditional schools soon. Sadly many schools try to ban ChatGPT instead of including it which is not only counterproductive but is also not possible since there are no tools that can accurately detect AI-written text. But text-based tasks are not the limit. Recent papers like Toolformer show that LLMs will soon be able to control and use other hard and software. This will lead to numerous new abilities and will enable them to take over a variety of new tasks. A personal assistant as Apple promised us years ago when they released Siri will soon be a reality.

Meta compares Brain to LLMs

Meta published an article where they compared the behavior of the brain to large language models. They showed the important differences and similarities underlying the process of text predictions. The research group tested 304 participants with functional magnetic resonance imaging to show how the brain predicts a hierarchy of representations that spans multiple timescales. They also showed that the activations of modern language models linearly map onto the brain responses to speech.

Organoid Intelligence: creating biological computers out of the human brain

A team of researchers published an article on their research on biocomputing. It goes in-depth about the potential of such systems and how to build them. The core idea is to grow brain tissue out of stem cells to use the high energy efficiency and ability to perform complex tasks with organoid-computer interfaces. Instead of copying the human brain with AI, we use it directly as a computing device. Since it is much more likely to develop conscious systems this way, the ethical side of this research is critical. The article also explores the ways this research can help understand our own brain and cognitive diseases. Research like this pushes our understanding of consciousness and intelligence.

Microsoft lets you talk to robots

Microsoft showed how to use chatGPT to control robots with your voice. APIs and Prompts can be designed to enable chatGPT to run the robot. By combining the spoken task with API information, it is possible to let chatGPT generate the code and API calls to execute the task with a given robot. While this is a powerful use case of LLMs it is not a secure way to handle a robot since the safety of the generated code can not be guaranteed.

Microsoft published KOSMOS-1, a multimodal large language model

Microsoft released the paper “Language Is Not All You Need: Aligning Perception with Language Models “, where they introduce their multimodal large language model KOSMOS-1. KOSMOS-1 is still a language model at its core, but it can also use other training data, like images. It shows impressive results in a number of tasks, such as image transcription. It is, therefore, a much more general model than a simple language model and I think this is a step in the right direction for AGI since I believe that language alone is not enough for AGI.

OpenAI addressed Alignment and AGI concerns

OpenAi released a blog post about their plans for AGI and how to minimize the negative impacts. I highly recommend reading it yourself, but the key takeaways are:

  1. The mission is to ensure that AGI benefits humanity by increasing abundance, turbocharging the global economy, and aiding in the discovery of new scientific knowledge.
  2. AGI has the potential to empower humanity with incredible new capabilities, but it also comes with serious risks of misuse, drastic accidents, and societal disruption.
  3. To prepare for AGI, a gradual transition to a world with AGI is better than a sudden one. The deployment of AGI should involve a tight feedback loop of rapid learning and careful iteration, and democratized access will lead to more and better research, decentralized power, and more benefits. Developing increasingly aligned and steerable models, empowering individuals to make their own decisions, and engaging in a global conversation about key issues are also important.

New LLMs by Meta.

Meta released 4 new Large Language Models, ranging from 6.7B to 65.2B parameters. By using the chinchilla law and only publically available they reached state-of-the-art performance in their biggest model which is still significantly smaller than comparable models like GPT-3.5 or PaLM. Their smallest model is small enough to run on consumer Hardware and is still comparable to GPT-3.

New Paper by Google uses Generative AI to train Robots

Google just published the paper “Scaling Robot Learning with Semantically Imagined Experience” showing how to use generated images like Imagen to generate Training data for their robot system. This allows the robot to have a more diverse data set and therefore be more robust and able to solve unseen tasks. We saw similar approaches using simulations for cars, but this is the first time that generative models were used.

Also from google, we got a new paper where they present their advancements in quantum error correction. By scaling to larger numbers of Qubits and combining them to logical Qubits they can reduce the quantum error rate significantly. This opens up a clear path to better quantum computers by just scaling them up.

Leaked Info reveals GPT-4 context window

OpenAI has privately announced a new developer product called Foundry, which enables customers to run OpenAI model inference at scale with dedicated capacity. It also reveals that DV (Davinci; likely GPT-4) will have up to 32k max context length in the public version. This is a huge improvement over the 8k window of GPT-3.5 which did not allow summaries of longer texts. (The google doc that contained the information was taken down by OpenAi, but a screenshot can be found on social media)

A Book Review of “A World Without Work” by Daniel Susskind

The Book “A World Without Work” by Daniel Susskind from 2020 is a thought-provoking book that explores the technological changes happening in today’s workforce and the potential impacts on society. Dr. Daniel Susskind, a Research Professor in Economics at King’s College London and a Senior Research Associate at the Institute for Ethics in AI at Oxford University, examines how automation and artificial intelligence are affecting jobs and the future of work. He argues that we need to rethink our economic and social systems to adapt to the coming technological changes.

I greatly enjoyed reading the book and even though I was familiar with many of the topics, I still learned a lot, particularly about economics. The book is divided into three parts. The first part “The Context”, describes the history of Automation and shows the parallels and differences between Industrialization and today’s development. For example, the author portrays the Luddites and their fight against textile machinery which helps readers to understand the recent fight against Generative AI.

The second part “The Threat”, explains in great detail the different reasons for technological unemployment and why the negative effects of automation outweigh the positive ones. It also explains how the current development leads to ever-greater inequality. Although many of the numbers were not new to me, the author manages to connect all the dots and paints a coherent picture of the problem.

The last part “The Response”, discusses solutions on how to build a working society in a world without work. The author addresses big tech companies and their political power, and how states have to fight back and tax in a way that allows everyone to receive an appropriate part of the economic pie. In the end, he addresses the problem of meaning and how humans can cope with too much free time. My biggest problem with this part is the missing description of how to transition from today’s system to the proposed solution, which in my opinion, is the hardest part. The book ends with an overly optimistic view, placing a lot of trust in humans and governments to build a working system in the future. Unfortunately, I do not share this trust in humanity. I also wish that the author had addressed the influence of other aspects of scientific progress on the economy, such as longevity or space exploration. However, I understand that this would have been outside the scope of the book.

I highly recommend this book to everyone who is working or will be in the next decade. Regardless of your occupation, this book is relevant to you. I also wish that political leaders would read it and act as proposed, to prevent a dystopia where rampant unemployment makes many societies fall apart.

Overall, “A World Without Work” is a thought-provoking and important book that raises important questions about the future of work and the economy. The author provides a clear and concise overview of the current state of the job market and the potential consequences of technological changes. He also does a good job of providing a balanced and nuanced view of the potential impacts of these changes, highlighting both the potential benefits and drawbacks.
The author’s proposal for a universal basic income is well-argued.
The book offers a clear and viable solution for addressing the issues of unemployment and inequality that may arise as a result of these changes. However, it is important to note that the solutions proposed in the book are not easy to implement, and it will require a collective effort from society, governments, and big tech companies to overcome the challenges that come with technological advancements.

Looking Back On 2022 And Predictions For 2023

2022 was an eventful year with lots of ups and downs. While the global economy is struggling, and problems like climate change and social instability continue to grow, there have also been some significant technological and scientific breakthroughs.

The most prominent developments probably happened in deep learning with the appearance of generative models that are able to generate human-level music, art, dialog, and code. In this context, I want to talk about two specific papers that shaped the field this year and most likely next year. The paper “Denoising Diffusion Probabilistic Models”  which is the basis for Dall-E 2, Stable diffusion, and many other generative models, and the chinchilla paper from Deepmind, which demonstrated the importance of high-quality training data over model size. This will likely shape the design and cost of future models, including the anticipated release of OpenAI’s GPT-4 in 2023, which is expected to outperform humans in many text-based tasks. The improvements are not only driven by Moore’s law and architectural improvements but also the money spent to train and develop these systems increases. This is expected as the potential is more and more recognized and the value these systems provide is ever-increasing.

Note that this is a logarithmic chart. the growth is nearly double exponential.

But not just GPT-4. AI will continue to disrupt various industries such as search and creative writing and spark public debate about its impact, even more than is happening right now. It will also lead to the production of high-quality media with fewer people and resources thanks to AI’s assistance. In the field of 3D generation, I expect to see similar progress in 2023, bringing us closer to the quality of 2D generation.

Fusion, the process of combining atomic nuclei to release a large amount of energy, has made significant strides in recent years. This is largely due to the incorporation of machine learning and advancements in various fields such as materials science and engineering. Recently, the U.S. Department of Energy announced that they were able to achieve a positive net outcome from a fusion reaction, which is a major milestone in the pursuit of unlimited clean energy. While I expect to see continued progress in this field, it is unlikely that we will see a commercial fusion reactor within the next two years. However, the upcoming start of the Iter project, an international collaboration to build a fusion reactor, may refuel interest and drive further developments in this promising area.

The James Webb Space Telescope (JWST) is an important milestone in the field of astronomy because it is designed to be the most powerful and advanced space telescope ever built. It started to operate this year. It is a collaboration between NASA, the European Space Agency (ESA), and the Canadian Space Agency (CSA). One of the main goals of the JWST is to study the early universe and the formation and evolution of galaxies. It will be able to observe some of the most distant objects in the universe, including the first stars and galaxies that formed after the Big Bang. In addition to studying the early universe, the JWST will also be able to observe exoplanets (planets outside of our solar system) and potentially search for signs of life on these planets. It will have the ability to study the atmospheres of exoplanets and look for biomarkers, such as oxygen and methane, which could indicate the presence of life. The JWST is also expected to make important contributions to our understanding of planetary science, by studying the atmospheres and surfaces of planets in our own solar system and beyond.

The James Webb Space Telescope (JWST)

The hardware industry has faced challenges this year due to manufacturing bottlenecks. Despite the continuation of Moore’s law and the development of new alternatives to silicon, it has been difficult to obtain chips at this time. The industry is restructuring in order to better handle future demand for hardware. Specialized hardware, such as AI processors and quantum computers, are seeing rapid development. According to IBM’s roadmap, we can expect to see quantum computers with over 1000 Qbits in the upcoming year. GPUs will become more important with the rise of AI. However, these advancements in hardware technology also come with the need for careful consideration and planning in terms of production and distribution. Ensuring a stable and efficient supply chain will be crucial in meeting the increasing demand for these specialized hardware components.

Virtual Reality (VR) technology has experienced a difficult period in recent years due to overhyping of its potential. While some people may have expected VR to revolutionize the way we interact with and experience the world, it has yet to reach the level of ubiquity and practicality that was promised by Meta. But the year 2023 is shaping up to be a promising one for the VR hardware market, with multiple new headsets, such as the Quest 3, and maybe even an Apple Headset, set to be released. These new products will likely offer improved graphics, more intuitive controls, and a wider range of content and experiences. While it may not fully realize the vision of a “Metaverse”, VR is still likely to be a great entertainment product for many people

2023 will be a critical year for AR. It will be the first time that we can build affordable Hardware in a small form factor. Chips like the Snapdragon AR2 Gen 1 implement Wifi 7 and low energy usage and will make it possible to build Smart glasses. Depending on the availability and price of the chips and other components I expect glasses from many different companies with even more capabilities than Oppo air Glass 2.

One of the most exciting developments in computer interfaces is the emergence of brain-computer interfaces (BCIs). These allow for direct communication between the brain and a computer, enabling the possibility of controlling devices with thought alone. While companies like Neuralink are claiming to begin human trials next year, non-invasive BCIs present a much lower barrier to entry and are being actively developed by startups such as Synchron, which has received significant funding. AI will also help the field by decoding brain signals. It is likely that we will see at least one viral video showcasing the capabilities of these non-invasive BCIs, similar to the viral video of a monkey playing pong using a BCI that was released last year. The potential applications for BCIs are vast and diverse, ranging from medical and therapeutic uses to gaming and everyday tasks. As these technologies continue to evolve, it is exciting to consider the possibilities for the future of human-computer interaction.

Researchers from biotech and other fields were able to develop an mRNA vaccine for COVID-19 in less than a year. The same technology was also used to create a universal flu vaccine and a vaccine for malaria. The combination of biology and AI has yielded promising results in the development of treatments for various viruses and illnesses. For example, a team led by Chris Jones of the Institute of Cancer Research used AI tools to identify a new drug combination to fight diffuse intrinsic pontine glioma, a type of incurable childhood brain cancer. The proposed combination extended survival in mice by 14% and has been tested in a small group of children. Additionally, Dr. Luis A. Diaz Jr. of Memorial Sloan Kettering Cancer Center published a paper in the New England Journal of Medicine describing a treatment that resulted in complete remission in all 18 rectal cancer patients who took the drug. Overall, the progress in the field is accelerating thanks to advancements in AI, such as Alphafold 2, which are designed to find and develop treatments for various diseases. If this continues we will be able to beat cancer in the next few years, which leads to the next field.

I predict that every person under 60 has the potential to live forever, as I mentioned in my post about longevity escape velocity. The field of aging research has made significant progress in recent years and is more confident than ever in its understanding of the aging process and life itself. For example, researchers at the Weizmann Institute of Science in Israel were able to create fully synthetic mouse embryos in a bioreactor using stem cells cultured in a Petri dish, without the use of an egg or sperm. These embryos developed normally, starting to elongate on day three and developing a beating heart by day eight. This marked a major advancement in the study of how stem cells form different organs and how mutations can cause developmental diseases. This is a promising step toward the end goal: Achieving complete control over all biological processes in the body.

While this was a slow year in some aspects, major progress was made in most fields, and 2023 will be even faster. We are at the knee of an exponential blowup and we are not ready for what is coming. While I am still worried about how society will react and adapt, I am excited for 2023 and the rest of the decade.

Newer posts »

© 2024 Maximilian Kannen

Theme by Anders NorenUp ↑