Doubleword | Resources

Scaling Curation with LLM Comparisons

•

11:00

Technical Guide

Scaling Curation with LLM Comparisons

•

February 6, 2026

LLM powered data structures: A concurrent, lock-free binary search tree

•

11:00

Technical Guide

LLM powered data structures: A concurrent, lock-free binary search tree

•

February 3, 2026

ZeroDP: Just-In-Time Weight Offloading over NVLink for Data Parallelism

•

11:00

Technical Guide

ZeroDP: Just-In-Time Weight Offloading over NVLink for Data Parallelism

•

January 30, 2026

Large-Scale Semantic Search Without Embeddings

•

11:00

Technical Guide

Large-Scale Semantic Search Without Embeddings

•

January 27, 2026

Parallel Primitives for Multi-Agent Workflows

•

11:00

Technical Guide

Parallel Primitives for Multi-Agent Workflows

•

January 22, 2026

Real-Time vs Batch Inference for LLMs: Use Cases, Costs, Workflow

•

11:00

Batch inference

Blog

Real-Time vs Batch Inference for LLMs: Use Cases, Costs, Workflow

•

January 19, 2026

Behind the Stack, Ep 13 - Faster Inference: Speculative Decoding for Batched Workloads

•

11:00

Inference Optimization

Technical Guide

Behind the Stack, Ep 13 - Faster Inference: Speculative Decoding for Batched Workloads

•

December 3, 2025

Costco of Inference: Introducing Doubleword Batched, the Inference Provider Built for Batched Workloads

•

11:00

Inference Optimization

Blog

Costco of Inference: Introducing Doubleword Batched, the Inference Provider Built for Batched Workloads

•

December 2, 2025

Behind the Stack Ep. 12 - Understanding Model Parallelism

•

11:00

Inference Optimization

Technical Guide

Behind the Stack Ep. 12 - Understanding Model Parallelism

•

November 19, 2025

Behind the Stack, Ep. 11 - How Speculative Decoding Speeds Up Language Models

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep. 11 - How Speculative Decoding Speeds Up Language Models

•

November 5, 2025

Doubleword Open Sources the World’s Fastest AI Gateway

•

11:00

Artificial Intelligence

News

Doubleword Open Sources the World’s Fastest AI Gateway

•

October 21, 2025

Chasing Cheap Tokens: 2x Cheaper Tokens Than H100s with Consumer Cards‍

•

11:00

Blog

Chasing Cheap Tokens: 2x Cheaper Tokens Than H100s with Consumer Cards‍

•

October 13, 2025

Should GPUs make Free Trade Agreements?

•

11:00

Blog

Should GPUs make Free Trade Agreements?

•

September 19, 2025

Behind the Stack, Ep 10 - Batched Endpoints

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 10 - Batched Endpoints

•

September 10, 2025

What is InferenceOps? Defining the Function Behind Scalable AI

•

11:00

Enterprise AI

Blog

What is InferenceOps? Defining the Function Behind Scalable AI

•

September 5, 2025

Scaling AI Requires InferenceOps, Not MLOps

•

11:00

Enterprise AI

Blog

Scaling AI Requires InferenceOps, Not MLOps

•

September 4, 2025

Behind the Stack, Ep 9 - How to Evaluate Open Source LLMs

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 9 - How to Evaluate Open Source LLMs

•

September 3, 2025

What the U.S. AI Action Plan Really Means for Regulated Enterprises

•

11:00

Enterprise AI

Blog

What the U.S. AI Action Plan Really Means for Regulated Enterprises

•

July 30, 2025

Behind the Stack Ep. 8 - Choosing the Right Inference Engine for Your LLM Deployment

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack Ep. 8 - Choosing the Right Inference Engine for Your LLM Deployment

•

July 15, 2025

Lightweight Prototyping or Full-Scale Ops? Ollama vs Doubleword Explained

•

11:00

Self-Hosted Architecture

Blog

Lightweight Prototyping or Full-Scale Ops? Ollama vs Doubleword Explained

•

July 9, 2025

Behind the Stack, Ep 7 - Choosing the Right Quantization for Self-Hosted LLMs

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 7 - Choosing the Right Quantization for Self-Hosted LLMs

•

July 8, 2025

Building GenAI in Regulated Industries: A Guide to Secure, Compliant AI

•

11:00

Enterprise AI

Blog

Building GenAI in Regulated Industries: A Guide to Secure, Compliant AI

•

July 1, 2025

Behind the Stack, Ep 6 - How to Speed up the Inference of AI Agents

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 6 - How to Speed up the Inference of AI Agents

•

July 1, 2025

Behind the Stack, Ep 5 - Making RAG Work for Multimodal Documents

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 5 - Making RAG Work for Multimodal Documents

•

June 24, 2025

Behind the Stack, Ep 4: Making Your Load Balancer LLM-Aware

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 4: Making Your Load Balancer LLM-Aware

•

June 18, 2025

GTC Europe 2025: ASAS AI & Doubleword Announce Strategic Partnership to Deliver Sovereign, Enterprise-Grade AI Solutions in Saudi Arabia and the Middle East

•

11:00

Press

GTC Europe 2025: ASAS AI & Doubleword Announce Strategic Partnership to Deliver Sovereign, Enterprise-Grade AI Solutions in Saudi Arabia and the Middle East

•

June 16, 2025

Doubleword doubles down on NVIDIA collaboration to give enterprises control over their AI with NVIDIA NIM microservices integration

•

11:00

Press

Doubleword doubles down on NVIDIA collaboration to give enterprises control over their AI with NVIDIA NIM microservices integration

•

June 11, 2025

Behind the Stack, Ep 3: How to Serve 100 Models on a Single GPU with No Cold Starts

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 3: How to Serve 100 Models on a Single GPU with No Cold Starts

•

June 10, 2025

Behind the Stack, Ep 2: How Many Users Can My GPU Serve?

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 2: How Many Users Can My GPU Serve?

•

June 4, 2025

Doubleword Launches Self-Hosted Inference Platform On Snowflake Marketplace

•

11:00

Press

Doubleword Launches Self-Hosted Inference Platform On Snowflake Marketplace

PR Newswire

•

June 3, 2025

Doubleword Launches Self-Hosted Inference Platform on Snowflake Marketplace

•

11:00

Blog

Doubleword Launches Self-Hosted Inference Platform on Snowflake Marketplace

•

June 3, 2025

Behind the Stack, Ep 1: What Should I Be Observing in my LLM Stack?

•

11:00

Self-Hosted Architecture

Technical Guide

Behind the Stack, Ep 1: What Should I Be Observing in my LLM Stack?

•

May 28, 2025

What It Really Takes to Self-Host Your Inference Stack

•

11:00

Self-Hosted Architecture

Technical Guide

What It Really Takes to Self-Host Your Inference Stack

•

May 23, 2025

Why Owning Your AI Stack Is Becoming a Strategic Advantage

•

11:00

Future of AI

Blog

Why Owning Your AI Stack Is Becoming a Strategic Advantage

•

May 22, 2025

AI-Powered Performance: How Digits Built Specialized Models for Accounting

•

11:00

Artificial Intelligence

AI-Powered Performance: How Digits Built Specialized Models for Accounting

•

May 13, 2025

Doubleword raises $12M Series A to make self-hosted AI inference effortless

•

11:00

Press

Doubleword raises $12M Series A to make self-hosted AI inference effortless

Startups Magazine

•

May 9, 2025

Doubleword raises $12M Series A led by Dawn Capital to make self-hosted AI inference effortless for enterprises

•

11:00

News

Doubleword raises $12M Series A led by Dawn Capital to make self-hosted AI inference effortless for enterprises

•

May 8, 2025

AI Startup Doubleword Raises £9M Series A Led by Dawn Capital

•

11:00

Press

AI Startup Doubleword Raises £9M Series A Led by Dawn Capital

Just AI News

•

May 8, 2025

Doubleword secures £9 million Series A Investment led by Dawn Capital

•

11:00

Press

Doubleword secures £9 million Series A Investment led by Dawn Capital

Deal Lite

•

May 8, 2025

UK’s Doubleword secures €10.6M to help businesses escape AI infrastructure overload: Here’s how

•

11:00

Press

UK’s Doubleword secures €10.6M to help businesses escape AI infrastructure overload: Here’s how

Silicon Canals

•

May 8, 2025

Doubleword raises £9m Series A led by Dawn Capital to make self-hosted AI inference effortless for enterprises

•

11:00

Press

Doubleword raises £9m Series A led by Dawn Capital to make self-hosted AI inference effortless for enterprises

Soapbox

•

May 8, 2025

Doubleword’s $12M fuels mission to bring easy, secure self-hosted AI to enterprises

•

11:00

Press

Doubleword’s $12M fuels mission to bring easy, secure self-hosted AI to enterprises

Tech Funding News

•

May 8, 2025

AI self-hosting start-up Doubleword finds new Dawn with £9m funding boost

•

11:00

Press

AI self-hosting start-up Doubleword finds new Dawn with £9m funding boost

Sky News

•

May 7, 2025

Announcing Doubleword: New Name, Same Team, Same Mission

•

11:00

Blog

Announcing Doubleword: New Name, Same Team, Same Mission

•

May 7, 2025

MLP: Attention in a Trench Coat

•

11:00

MLOps

Technical Guide

MLP: Attention in a Trench Coat

•

March 26, 2025

The Next Leap in Speculative Decoding: Inside Doubleword's Inference Engine

•

11:00

Fast LLMs

Technical Guide

The Next Leap in Speculative Decoding: Inside Doubleword's Inference Engine

•

March 3, 2025

The End of the Centralized API Era and the Rise of the AI Sprawl

•

11:00

Artificial Intelligence

Blog

The End of the Centralized API Era and the Rise of the AI Sprawl

•

February 25, 2025

Optimising LLM Latency: Why Speed Matters In Generative AI

•

11:00

Fast LLMs

Technical Guide

Optimising LLM Latency: Why Speed Matters In Generative AI

•

February 18, 2025

DeepSeek Chronicles: My Personal Take on the AI Buzz

•

11:00

Blog

DeepSeek Chronicles: My Personal Take on the AI Buzz

•

January 30, 2025

Take Control of Your AI: Why You Should Self Host Large Language Models

•

11:00

Blog

Take Control of Your AI: Why You Should Self Host Large Language Models

•

January 29, 2025

Takeoff Serverless LoRA: Efficient inference at scale for fine-tuned models

•

11:00

Inference Optimization

Technical Guide

Takeoff Serverless LoRA: Efficient inference at scale for fine-tuned models

•

January 27, 2025

Optimizing GPU Memory for LLMs: A Deep Dive into Paged Attention

•

11:00

Inference Optimization

Technical Guide

Optimizing GPU Memory for LLMs: A Deep Dive into Paged Attention

•

January 21, 2025

Reflection on 2024 Predictions: How Did We Do?

•

11:00

Enterprise AI

Blog

Reflection on 2024 Predictions: How Did We Do?

•

December 16, 2024

Introducing Llama 3.3 Support on TitanML: Advanced AI, Self-Hosted and Secure

•

11:00

News

Introducing Llama 3.3 Support on TitanML: Advanced AI, Self-Hosted and Secure

•

December 6, 2024

TitanML Bolsters Commercial Operations with George Westlake as Commercial Lead

•

11:00

Enterprise AI

News

TitanML Bolsters Commercial Operations with George Westlake as Commercial Lead

•

November 28, 2024

TitanML Strengthens US Operations with Appointment of Enterprise AI Expert Amanda Milberg

•

11:00

Enterprise AI

News

TitanML Strengthens US Operations with Appointment of Enterprise AI Expert Amanda Milberg

•

November 25, 2024

Introducing the TitanML Model Memory Calculator - A Community Resource

•

11:00

Model Serving

Blog

Introducing the TitanML Model Memory Calculator - A Community Resource

•

September 11, 2024

TitanML Takeoff 0.17: Unleashing New Capabilities and Performance Enhancements

•

11:00

Titan Takeoff Inference Server

News

TitanML Takeoff 0.17: Unleashing New Capabilities and Performance Enhancements

•

August 19, 2024

TitanML's Vision for AI Integration: Insights from Dataiku's Everyday AI Conference

•

11:00

Enterprise AI

Blog

TitanML's Vision for AI Integration: Insights from Dataiku's Everyday AI Conference

•

August 12, 2024

Taming Enterprise RAG: Essential Tips from TitanML's CEO for Efficient AI Infrastructure

•

11:00

Quantization

Blog

Taming Enterprise RAG: Essential Tips from TitanML's CEO for Efficient AI Infrastructure

•

August 7, 2024

TitanML Dataiku Plugin: Major Update Brings Snowflake Integration and Enhanced AI Capabilities

•

11:00

Enterprise AI

Blog

TitanML Dataiku Plugin: Major Update Brings Snowflake Integration and Enhanced AI Capabilities

•

August 6, 2024

Takeoff 0.16.0: Enterprise RAG with Enhanced Performance and Expanded Capabilities

•

11:00

Titan Takeoff Inference Server

News

Takeoff 0.16.0: Enterprise RAG with Enhanced Performance and Expanded Capabilities

•

July 29, 2024

TitanML Introduces Full Support for Llama 3.1 Family on the Takeoff Inference Stack

•

11:00

Enterprise AI

News

TitanML Introduces Full Support for Llama 3.1 Family on the Takeoff Inference Stack

•

July 23, 2024

Bringing Sci-Fi to Life: How TitanML Powered HPE's Groundbreaking Hologram AI Assistant

•

11:00

Future of AI

Blog

Bringing Sci-Fi to Life: How TitanML Powered HPE's Groundbreaking Hologram AI Assistant

•

July 2, 2024

Resource Center

Scaling Curation with LLM Comparisons

Scaling Curation with LLM Comparisons

LLM powered data structures: A concurrent, lock-free binary search tree

LLM powered data structures: A concurrent, lock-free binary search tree

ZeroDP: Just-In-Time Weight Offloading over NVLink for Data Parallelism

ZeroDP: Just-In-Time Weight Offloading over NVLink for Data Parallelism

Large-Scale Semantic Search Without Embeddings

Large-Scale Semantic Search Without Embeddings

Parallel Primitives for Multi-Agent Workflows

Parallel Primitives for Multi-Agent Workflows

Real-Time vs Batch Inference for LLMs: Use Cases, Costs, Workflow

Real-Time vs Batch Inference for LLMs: Use Cases, Costs, Workflow

Behind the Stack, Ep 13 - Faster Inference: Speculative Decoding for Batched Workloads

Behind the Stack, Ep 13 - Faster Inference: Speculative Decoding for Batched Workloads

Costco of Inference: Introducing Doubleword Batched, the Inference Provider Built for Batched Workloads

Costco of Inference: Introducing Doubleword Batched, the Inference Provider Built for Batched Workloads

Behind the Stack Ep. 12 - Understanding Model Parallelism

Behind the Stack Ep. 12 - Understanding Model Parallelism

Behind the Stack, Ep. 11 - How Speculative Decoding Speeds Up Language Models

Behind the Stack, Ep. 11 - How Speculative Decoding Speeds Up Language Models

Doubleword Open Sources the World’s Fastest AI Gateway

Doubleword Open Sources the World’s Fastest AI Gateway

Chasing Cheap Tokens: 2x Cheaper Tokens Than H100s with Consumer Cards‍

Chasing Cheap Tokens: 2x Cheaper Tokens Than H100s with Consumer Cards‍

Should GPUs make Free Trade Agreements?

Should GPUs make Free Trade Agreements?

Behind the Stack, Ep 10 - Batched Endpoints

Behind the Stack, Ep 10 - Batched Endpoints

What is InferenceOps? Defining the Function Behind Scalable AI

What is InferenceOps? Defining the Function Behind Scalable AI

Scaling AI Requires InferenceOps, Not MLOps

Scaling AI Requires InferenceOps, Not MLOps

Behind the Stack, Ep 9 - How to Evaluate Open Source LLMs

Behind the Stack, Ep 9 - How to Evaluate Open Source LLMs

What the U.S. AI Action Plan Really Means for Regulated Enterprises

What the U.S. AI Action Plan Really Means for Regulated Enterprises

Behind the Stack Ep. 8 - Choosing the Right Inference Engine for Your LLM Deployment

Behind the Stack Ep. 8 - Choosing the Right Inference Engine for Your LLM Deployment

Lightweight Prototyping or Full-Scale Ops? Ollama vs Doubleword Explained

Lightweight Prototyping or Full-Scale Ops? Ollama vs Doubleword Explained

Behind the Stack, Ep 7 - Choosing the Right Quantization for Self-Hosted LLMs

Behind the Stack, Ep 7 - Choosing the Right Quantization for Self-Hosted LLMs

Building GenAI in Regulated Industries: A Guide to Secure, Compliant AI

Building GenAI in Regulated Industries: A Guide to Secure, Compliant AI

Behind the Stack, Ep 6 - How to Speed up the Inference of AI Agents

Behind the Stack, Ep 6 - How to Speed up the Inference of AI Agents

Behind the Stack, Ep 5 - Making RAG Work for Multimodal Documents

Behind the Stack, Ep 5 - Making RAG Work for Multimodal Documents

Behind the Stack, Ep 4: Making Your Load Balancer LLM-Aware

Behind the Stack, Ep 4: Making Your Load Balancer LLM-Aware

GTC Europe 2025: ASAS AI & Doubleword Announce Strategic Partnership to Deliver Sovereign, Enterprise-Grade AI Solutions in Saudi Arabia and the Middle East

GTC Europe 2025: ASAS AI & Doubleword Announce Strategic Partnership to Deliver Sovereign, Enterprise-Grade AI Solutions in Saudi Arabia and the Middle East

Doubleword doubles down on NVIDIA collaboration to give enterprises control over their AI with NVIDIA NIM microservices integration

Doubleword doubles down on NVIDIA collaboration to give enterprises control over their AI with NVIDIA NIM microservices integration

Behind the Stack, Ep 3: How to Serve 100 Models on a Single GPU with No Cold Starts

Behind the Stack, Ep 3: How to Serve 100 Models on a Single GPU with No Cold Starts

Behind the Stack, Ep 2: How Many Users Can My GPU Serve?

Behind the Stack, Ep 2: How Many Users Can My GPU Serve?

Doubleword Launches Self-Hosted Inference Platform On Snowflake Marketplace

Doubleword Launches Self-Hosted Inference Platform On Snowflake Marketplace

Doubleword Launches Self-Hosted Inference Platform on Snowflake Marketplace

Doubleword Launches Self-Hosted Inference Platform on Snowflake Marketplace

Behind the Stack, Ep 1: What Should I Be Observing in my LLM Stack?

Behind the Stack, Ep 1: What Should I Be Observing in my LLM Stack?

What It Really Takes to Self-Host Your Inference Stack

What It Really Takes to Self-Host Your Inference Stack

Why Owning Your AI Stack Is Becoming a Strategic Advantage

Why Owning Your AI Stack Is Becoming a Strategic Advantage

AI-Powered Performance: How Digits Built Specialized Models for Accounting

AI-Powered Performance: How Digits Built Specialized Models for Accounting

Doubleword raises $12M Series A to make self-hosted AI inference effortless

Doubleword raises $12M Series A to make self-hosted AI inference effortless

Doubleword raises $12M Series A led by Dawn Capital to make self-hosted AI inference effortless for enterprises

Doubleword raises $12M Series A led by Dawn Capital to make self-hosted AI inference effortless for enterprises

AI Startup Doubleword Raises £9M Series A Led by Dawn Capital

AI Startup Doubleword Raises £9M Series A Led by Dawn Capital

Doubleword secures £9 million Series A Investment led by Dawn Capital

Doubleword secures £9 million Series A Investment led by Dawn Capital

UK’s Doubleword secures €10.6M to help businesses escape AI infrastructure overload: Here’s how