Researchers have made a breakthrough in the field of artificial intelligence, successfully extending the context length of BERT, a Transformer-based natural language processing model, to two million tokens. The team achieved this feat by incorporating a recurrent memory into BERT using the Recurrent Memory Transformer (RMT) architecture.

The researchers’ method increases the model’s effective context length and maintains high memory retrieval accuracy. This allows the model to store and process both local and global information, improving the flow of information between different segments of an input sequence.

The study’s experiments demonstrated the effectiveness of the RMT-augmented BERT model, which can now tackle tasks on sequences up to seven times its originally designed input length (512 tokens). This breakthrough has the potential to significantly enhance long-term dependency handling in natural language understanding and generation tasks, as well as enable large-scale context processing for memory-intensive applications.