A new research paper proposes a method to accelerate the training of large-scale transformers, called the Linear Growth Operator (LiGO). By utilizing the parameters of smaller, pre-trained models to initialize larger models, LiGO can save up to 50% of the computational cost of training from scratch while achieving better performance. This approach could have important implications for the field of AGI by enabling more efficient and effective training methods for large-scale models, and potentially leading to more flexible and adaptable models that can learn to grow and evolve over time. If this is already used to train GPT-5 it could mean that we get GPT-5 earlier than expected.
Tag: news (Page 4 of 4)
OpenAI announced that they will introduce plugins to ChatGPT. Two of them developed by OpenAi themself allow the model to search the web for information and run generated python code. Other third-party plugins like Wolfram allow the model to use other APIs to perform certain tasks. the future capabilities of a model enhanced this way are limitless. I talked about this development in my Post “From GPT-4 to Proto-AGI” where I predicted this development. If the capability to run generated code is not too limited, I would call this Proto-AGI.
After Copilot became inferior to GPT-4, they finally announced a set of new functionalities based on GPT-4, like Generated pull requests, answering questions about code or documentation, and helping with coding.
Google’s GPT alternative Bard is now available in the US and UK. Early testers already speak out in favor of Bing which also launched image generation this week. Bard is based on LaMDA, an older Language model that is not as capable as GPT-4.
Right now the GTC 2023 is going on and Nvidia showed off some of their newest steps in AI including this amazing Intro.
They introduced cuLitho, a new tool to optimize the design of processors. This was a complicated process that took weeks to calculate and can now be done in a few hours. Speeding up the chip design will lead to a speedup of the entire industry and shows how positive feedback loops power exponential growth.
They also talked about their new H100 chips for their DGX supercomputers. These chips will not only power the servers of big AI players like Aws, Azure, and OpenAI, but also Nvidias own cloud servers, which will be available for smaller companies.
Part of this Cloud service will be Nvidia cloud foundation will provide pre-trained models for text, image, and protein-sequencing and will run the training and interference of the models. One of the first users is Adobe, which uses the service for its new AI service Firefly.
In the end, they also presented a new server CPU “Grace” and the Bluefield-3 DPU which will power future data centers.
I am most impressed by their hardware improvements and their AI cloud platform which will both accelerate Ai adoption greatly.
A new study by OpenAI and the University of Pennsylvania investigates the potential impact of Generative Pre-trained Transformer (GPT) models on the U.S. labor market. The paper, titled “GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models,” assesses occupations based on their correspondence with GPT capabilities, using both human expertise and classifications from GPT-4. The study finds that approximately 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of GPTs, while around 19% of workers may see at least 50% of their tasks impacted. The impact spans all wage levels, with higher-income jobs potentially facing greater exposure. The paper concludes that GPTs exhibit characteristics of general-purpose technologies, which could have significant economic, social, and policy implications. This comes to no surprise for everyone who used GPT-4 or watched the recent Microsoft announcment.
I discussed this topic in more depth in my book review of “A World Without Work”. This research supports the author’s point and indicates a radical shift in the economy in the coming years. I highly recommend reading the paper, the book, or at least my book review.
FlexGen is a new generation engine that enables high-throughput inference of large language models on a single commodity GPU. It uses a linear programming optimizer to efficiently store and access tensors and compresses weights and attention cache to 4 bits. FlexGen achieves significantly higher throughput than state-of-the-art offloading systems, reaching a generation throughput of 1 token/s with an effective batch size of 144 on a single 16GB GPU. This means that running LLMs on smaller servers could become viable for more and more companies and individuals.
Researchers from several institutions, including the University of California, Berkeley, and Facebook AI Research, have developed a new transformer model that can process long documents faster and more efficiently than previous models. The team’s paper, titled “CoLT5: Faster Long-Range Transformers with Conditional Computation,” describes a transformer model that uses conditional computation to devote more resources to important tokens in both feedforward and attention layers.
CoLT5’s ability to effectively process long documents is particularly noteworthy, as previous transformer models struggled with the quadratic attention complexity and the need to apply feedforward and projection layers to every token. The researchers show that CoLT5 outperforms LongT5, the previous state-of-the-art long-input transformer model, on the SCROLLS benchmark, while also boasting much faster training and inference times.
Furthermore, the team demonstrated that CoLT5 can handle inputs up to 64k in length with strong gains. These results suggest that CoLT5 has the potential to improve the efficiency and effectiveness of many natural language processing tasks that rely on long inputs.
AssembyAi added a new speech recognition model to their products. Conformer-1 is “a state-of-the-art speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data.” It combines convolutional networks with transformers to archive never seen scores on various recognition tasks.
Today Microsoft showed off how they integrated AI tools, including GPT-4, into their office products. You can ask Copilot to build excel tables, PowerPoints, and Emails or ask it about meetings, or lets it summarise documents and chats.
Although currently only available to a select few companies, Copilot is set to become widely available over the next few months. This integration of AI technology has the potential to significantly increase productivity for office workers and could have far-reaching implications for the economy as a whole.
OpenAI presented its new GPT model today. GPT-4 has a context window of 32K tokens and outperforms humans and previous models like GPT-3.5 in almost all language tasks. It is also multimodal and supports images as inputs. Read more here or watch the presentation here.
OpenAI just released GPT-4, a game-changer in AI language models. With a 32k token context window, it outperforms humans and GPT-3.5 in most language tasks. Key improvements: bigger context window, better performance, and enhanced fine-tuning. Exciting applications include content generation, translation, virtual assistants, customer support, and education. Can’t wait to see how GPT-4 reshapes our AI-driven world!
Watch the presentation here.
This post was generated by GPT-4
In a new blog post, Google presents their Generative AI App Builder, PaLM API, and MakerSuite which works similarly to OpenAI’s playground.
This announcement is happening shortly before the Microsoft presentation on Thursday. Similar to how they did it with their Bard presentation just before the Bing chat announcement.
Meta published an article where they compared the behavior of the brain to large language models. They showed the important differences and similarities underlying the process of text predictions. The research group tested 304 participants with functional magnetic resonance imaging to show how the brain predicts a hierarchy of representations that spans multiple timescales. They also showed that the activations of modern language models linearly map onto the brain responses to speech.
Just a moment ago OpenAI opened their ChatGPT and Whisper API. they also published their previously leaked dedicated instance service. ChatGPT will be available for 0.002$ per 1000 tokens which is incredibly cheap and will be getting updates regularly. Whisper will be available for 0.006$ per minute of audio data.
A team of researchers published an article on their research on biocomputing. It goes in-depth about the potential of such systems and how to build them. The core idea is to grow brain tissue out of stem cells to use the high energy efficiency and ability to perform complex tasks with organoid-computer interfaces. Instead of copying the human brain with AI, we use it directly as a computing device. Since it is much more likely to develop conscious systems this way, the ethical side of this research is critical. The article also explores the ways this research can help understand our own brain and cognitive diseases. Research like this pushes our understanding of consciousness and intelligence.
Microsoft showed how to use chatGPT to control robots with your voice. APIs and Prompts can be designed to enable chatGPT to run the robot. By combining the spoken task with API information, it is possible to let chatGPT generate the code and API calls to execute the task with a given robot. While this is a powerful use case of LLMs it is not a secure way to handle a robot since the safety of the generated code can not be guaranteed.
Microsoft released the paper “Language Is Not All You Need: Aligning Perception with Language Models “, where they introduce their multimodal large language model KOSMOS-1. KOSMOS-1 is still a language model at its core, but it can also use other training data, like images. It shows impressive results in a number of tasks, such as image transcription. It is, therefore, a much more general model than a simple language model and I think this is a step in the right direction for AGI since I believe that language alone is not enough for AGI.
Huggingface and amazon AWS announced their partnership to scale AI in the cloud. Amazon is already the leading cloud computing provider, and Huggingface is the biggest platform for machine learning developers. This will hopefully lead to cheaper and faster development of AI solutions for smaller developer teams and companies.
Synchron has published peer-reviewed, long-term safety results from a clinical study in four patients for their brain-computer interface. The company is backed by Bezos and Gates and uses blood vessels to insert sensors into the brain which is less invasive and safer than inserting sensors directly into the brain like neuralink.