Researchers from several institutions, including the University of California, Berkeley, and Facebook AI Research, have developed a new transformer model that can process long documents faster and more efficiently than previous models. The team’s paper, titled “CoLT5: Faster Long-Range Transformers with Conditional Computation,” describes a transformer model that uses conditional computation to devote more resources to important tokens in both feedforward and attention layers.
CoLT5’s ability to effectively process long documents is particularly noteworthy, as previous transformer models struggled with the quadratic attention complexity and the need to apply feedforward and projection layers to every token. The researchers show that CoLT5 outperforms LongT5, the previous state-of-the-art long-input transformer model, on the SCROLLS benchmark, while also boasting much faster training and inference times.
Furthermore, the team demonstrated that CoLT5 can handle inputs up to 64k in length with strong gains. These results suggest that CoLT5 has the potential to improve the efficiency and effectiveness of many natural language processing tasks that rely on long inputs.