Doubleword logo black
Product
Product
Inference StackControl Layer
Solutions
By Deployment Option
On-premiseCloudHybrid
By Team
AI, ML & Data SciencePlatform, DevOps & ITCompliance & Cyber
Resources
Resource CenterAI DictionaryCustomer Stories
Docs
Pricing
Careers
Book a demo
Book a demo
Careers
/
Senior Infrastructure Engineer: LLM Inference Systems

Senior Infrastructure Engineer: LLM Inference Systems

Description: Design and operate sophisticated infrastructure that makes large language models run fast and cheap at scale.

Location: London, UK (Hybrid)

Compensation: Competitive with equity.

How to Apply: Send your CV to fergus.finn@doubleword.ai

‍

Apply for this job
Share:
https://doubleword.ai/careers/senior-infrastructure-engineer-llm-inference-systems
Copied

‍

About the Role

We're seeking a Senior Infrastructure Engineer to join our mission of building the best LLM inference team in Europe. You'll be responsible for designing and operating sophisticated infrastructure that makes large language models run fast and cheap at scale. This role combines deep technical expertise in distributed systems, hardware optimization, and inference infrastructure with a research-oriented mindset to solve complex performance problems.

‍

You'll work at the intersection of hardware and software, building systems that optimize LLM inference workloads and push the boundaries of what's possible in terms of speed, cost, and efficiency. This is an opportunity to shape critical infrastructure that directly impacts how the world accesses and uses large language models.

What You'll Do

Examples of projects you might work on:

‍

  1. Building and optimizing infrastructure for batch inference workloads: focusing on high throughput, cost-efficient processing
  2. Optimizing request scheduling for LLM inference: intelligent routing and load balancing
  3. Designing system-level optimizations: such as prefix-dependent routing algorithms and performance optimizations
  4. Creating reproducible benchmarking and validation infrastructure.
  5. Provisioning, managing, & maintaining heterogeneous GPU infrastructure, and the workloads that run on it.

What We're Looking For

Note: A good candidate will have 80% of the following quantities. Please apply, even if the following doesn’t describe you perfectly.

Core Technical Skills:

  • Strong programming fundamentals
  • Experience in both Python, and Rust, C++ or similar systems languages
  • Understanding of GPU architectures and their performance characteristics
  • Experience with Kubernetes, container orchestration, and cloud platforms
  • Experience building distributed systems and high-performance infrastructure

‍

ML & Hardware Expertise:

  • Deep understanding of LLM inference workloads, performance characteristics, and optimization techniques
  • Familiarity with ML optimization libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM)
  • Ability to reason about hardware performance trade-offs and optimization strategies
  • Understanding of heterogeneous compute environments and their characteristics

‍

Systems & Operations:

  • Experience with GitOps workflows and infrastructure-as-code practices
  • Understanding of performance profiling, bottleneck analysis, and system optimization interfaces
  • Experience with monitoring, alerting, and observability systems

‍

Research Mindset:

  • Curiosity about emerging hardware trends and ML optimization techniques
  • Ability to understand complex research requirements and translate them into infrastructure needs
  • Comfort with ambiguity and rapidly evolving technical landscapes
  • Experience supporting research workflows and experimental systems

What You'll Gain

  • Opportunity to build the best LLM inference infrastructure in Europe
  • Work on cutting-edge performance optimization problems at scale
  • Direct impact on making AI more accessible through faster, cheaper inference
  • Collaboration with world-class engineers focused on inference optimization
  • Experience scaling inference systems to unprecedented levels of efficiency

About Us

We're dedicated to making large language models faster, cheaper, and more accessible. Our infrastructure team is laser-focused on LLM inference optimization, pushing the boundaries of what's possible in terms of performance and cost efficiency while maintaining the reliability needed to serve these models at scale.

‍

We provide competitive compensation, comprehensive benefits, and opportunities for professional growth in one of the most exciting fields in technology.

‍

Doubleword logo white
Sitemap
HomePricingDocsResourcesBook a demoCareers
Contact
hello@doubleword.ai
Address
Farringdon, London
JOIN THE COMMUNITY
Subscribe to our newsletter
Thanks you for subscription!
Oops! Something went wrong while submitting the form.
©2025 Doubleword. All rights reserved.
designed by
celerart
Privacy Policy
We use cookies to ensure you get the best experience on our website.
Accept
Deny