January 18, 2025

About

A paper titled “Don’t Do RAG” has been published.
In a time when systems using RAG (Retrieval-Augmented Generation) are highly praised, it’s a very catchy title.
What exactly is it trying to say?

https://arxiv.org/abs/2412.15605

Cache-Augmented Generation (CAG)

Their argument centers on the utility of a method called Cache-Augmented Generation (CAG). Since RAG ultimately cannot escape the inherent difficulties of building a retrieval system, they propose an approach where knowledge is preloaded into the LLM, allowing it to generate text directly. This approach is indeed reasonable, especially given the recent emergence of LLMs with significantly extended context windows. Additionally, by preloading knowledge into the LLM, the system becomes simpler and requires less effort for retrieval or data input.

While I haven’t grasped all the finer details, it seems like a promising method worth keeping in mind as an idea. It could be particularly useful for use cases where the amount of prior knowledge is relatively limited.