Doubleword | Digits Case study

Background

When envisioning accounting workflows, the traditional image often involves accountants sifting through messy spreadsheets manually to review and categorize transactions. This process can take 7-14 days—rendering books outdated by the time insights are delivered. Digits, a financial technology company, aims to revolutionize this by transforming manual tasks into automated, AI-powered accounting software that delivers reliable, real-time insights.

Automating accounting is complex due to the subjective nature of the field, where multiple interpretations can exist for the same transaction. Even a small error, like misreading a comma in a number, can drastically alter figures. Recognizing these challenges, Digits' ML team broke down the problem into smaller, manageable subtasks, selecting the appropriate model for each task. This resulted in a pipeline that combines traditional ML approaches, such as BERT classifiers and NER models, with powerful generative models, achieving over 93% accuracy in transaction classification—far exceeding the 66% ceiling of the best-performing general-purpose LLMs.

In addition to the impressive accuracy, Digits’ specialized model pipeline offers benefits like low-latency processing and reduced hallucinations. Their achievements highlight a fundamental insight for developing ML-powered pipelines in real-world applications: one size does not fit all. Specialized pipelines with smaller models, task-specific models can achieve levels of accuracy, efficiency, and reliability that general-purpose large language models (LLMs) struggle to match.

The Problem

When designing their ML pipeline, the Digits team identified several critical challenges that needed to be addressed to meet the demands of real-world accounting workflows:

Latency: To process millions of requests daily, the solution had to meet strict low-latency requirements. The team faced the challenge of balancing speed and accuracy, particularly when evaluating whether high-reasoning models, which often have longer response times, could outperform traditional models like BERT classifiers.
Privacy and Security: Given the sensitive nature of financial data, many general-purpose LLMs that rely on external API calls raised concerns about data privacy. Digits required a solution that ensured full control over data by prioritizing self-hosted AI systems to protect client information.
Edge Cases and Trainability: Real-world accounting scenarios often involve edge cases that general-purpose LLMs struggle to manage effectively. To address this challenge, the system needed to incorporate user feedback through iterative feedback loops. This continuous learning process not only enhances model accuracy but also improves the handling of edge cases, ensuring that the system adapts to the nuanced requirements of accounting.
Hallucination: LLMs frequently generate incorrect or non-existent outputs, such as suggesting invalid accounting categories. Digits needed to mitigate this risk by designing a system that could validate outputs and prevent hallucinations, ensuring reliability in their applications.

The Solution & Results

Digits recognized the limitations of general-purpose LLMs and addressed these challenges with targeted solutions that leveraged the best model for each specific task. This approach enabled them to build a robust, efficient, and secure ML-powered pipeline tailored to the unique demands of accounting automation. You can explore the full whitepaper here.

Low Latency: To meet the strict low-latency requirements, Digits designed a pipeline of several specialized models, utilizing LLMs only for the most complex tasks that require deeper reasoning or context. The approach resulted in an impressive average latency of just 0.04 seconds per request, significantly outperforming singular LLM pipelines which averaged 3.67 seconds. Such low latency ensures rapid processing times without compromising accuracy, delivering superior value compared to slower alternatives that may offer only marginal improvements in accuracy.

Privacy and Security: To address concerns about sharing sensitive financial data with third-party providers, Digits self-hosted all their models with Doubleword on their own hardware. This approach provided full control over data security and eliminated risks associated with external API calls.
Edge Cases and Trainability: Digits’ purpose-built system achieved over 93% accuracy in transaction classification, significantly surpassing the 66% ceiling observed in the best-performing general-purpose LLMs (e.g., GPT-4.5-preview). By focusing on traditional ML models, which are easier to fine-tune and adapt based on user feedback, Digits ensured their system could continuously improve its performance. This capability allowed them to address edge cases with greater precision leading to an increase in accuracy.

Hallucination Prevention: A common issue with LLMs is their tendency to confidently generate incorrect or non-existent outputs, such as suggesting invalid accounting categories. Digits mitigated this risk by limiting the use of LLMs to tasks where their generative capabilities were essential, while the majority of tasks were handled by task-specific models. This strategy significantly reduced hallucination rates, which is critical at scale—where even a modest 1% hallucination rate could result in 10,000 misclassified transactions per million processed.

Doubleword's Impact

Hannes Hapke, Principal ML Engineer at Digits, shared: “It’s really easy to get an AI-powered solution to 80%, but that extra 20% is what’s needed to make it production-ready. Doubleword supports you with both innovative technology and a knowledgeable team to help you cross that threshold. They’ve been a tremendous partner in our ability to launch, and we’re excited to keep pushing the envelope with them by our side.”

Digits specifically highlighted how Doubleword has supported them through:

Enterprise Ready Inference Stack: Doubleword's turn key AI inference stack enables Digits to serve any open-source model at low-latency rates, playing a critical role in meeting their performance requirements.
Future Thinking Feature Support: Features like JSON output formatting and batched LoRA serving are not only beneficial for current needs but also pave the way for future innovations in their pipeline. The implementation of these features, coupled with clear guidance and support from Doubleword's team, has increased Digits' confidence in their overall system design.
Partnership with Experts: Digits underscored the value of collaborating with experts in the inference space, which has allowed them to work alongside a trusted partner throughout the process.

Key Takeaways

Digits has demonstrated that highly specialized ML systems can outperform general-purpose LLMs, particularly in domains like accounting, where accuracy, privacy, and subjectivity are critical. Their approach underscores the potential for smaller, domain-specific models to deliver superior results through targeted optimization and the thoughtful integration of LLMs as complementary tools.

Discover how Digits’ AI-powered accounting software can revolutionize your financial management. Get started for free today!

It’s really easy to get an AI-powered solution to 80%, but that extra 20% is what’s needed to make it production-ready. Doubleword supports you with both innovative technology and a knowledgeable team to help you cross that threshold. They’ve been a tremendous partner in our ability to launch, and we’re excited to keep pushing the envelope with them byour side.

Hannes Hapke

Machine learning engineer at Digits

AI-Powered Performance: How Digits Built Specialized Models for Accounting

>93% accuracy

Ultra-low latency

Complete control

Background

The Problem

The Solution & Results

Doubleword's Impact

Key Takeaways

Table of contents:

Want to learn more?