TitanML is now Doubleword
Doubleword logo black
Product
Resources
Resource CenterAI Dictionary
Docs
Pricing
Book a demo
Book a demo
Resources
/
Blog
/
Why Long Context Length is Not the Death of RAG
March 1, 2024

Why Long Context Length is Not the Death of RAG

Meryem Arik
Share:
https://doubleword.ai/resources/why-long-context-length-is-not-the-death-of-rag
Copied
To Webinar
•

Google recently announced their new Gemini model can handle over 1 million tokens of context length - a huge leap in AI capabilities. Many have proclaimed this advancement spells the end of Retrieve and Generate (RAG) systems. However, we don't believe long context length represents the demise of RAG, for several key reasons:

Cost and Speed

Long context lengths are extremely expensive to run computationally. The more context provided, the slower and more resource intensive the model inference becomes. RAG systems help reduce the tokens needing processing by retrieving the most relevant passages upfront, enabling faster and cheaper overall results.

Unproven Performance

While impressive in scale, it is still undetermined how accurate Gemini's recall abilities are over such vast contexts. RAG systems aim to optimize the entire pipeline - search, embeddings and ranking - to feed prompts relevant content. Gemini's memory performance over 1 million tokens requires further evaluation.

Loss of Auditability

A major advantage of RAG systems is providing audit trails showing what content was deemed relevant as input. This grant some explainability into the otherwise "black box" workings of AI. With ultra long contexts like Gemini's, auditability is lost by sheer volume, hampering its usefulness for many enterprise use cases.In summary, while an exciting advancement showing AI's potential, long context length alone is unlikely to make RAG obsolete quite yet. The strengths around cost, performance optimization and auditability mean RAG still has significant value in operational environments. We look forward to seeing how these capabilities evolve together over time.

Footnotes

Table of contents:

Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Learn more about self-hosted AI Inference
Subscribe to our newsletter
Thanks you for subscription!
Oops! Something went wrong while submitting the form.

Want to learn more?

We work with enterprises at every stage of their self-hosting journey - whether you're deploying your first model in an on-prem environment or scaling dozens of fine-tuned, domain-specific models across a hybrid, multi-cloud setup. Doubleword is here to help you do it faster, easier, and with confidence.

Book a demo
Doubleword logo white
Sitemap
HomePricingDocsResourcesBook a demo
Contact
hello@doubleword.ai
Adress
Farringdon, London
JOIN THE COMMUNITY
Subscribe to our newsletter
Thanks you for subscription!
Oops! Something went wrong while submitting the form.
©2025 Doubleword. All rights reserved.
designed by
celerart
Privacy Policy
We use cookies to ensure you get the best experience on our website.
Accept
Deny