R&D · Applied Research · Theo Coleman

The Foundry

Where we build what doesn’t exist yet.

BespokeWorks' research and development arm. We build AI systems from first principles, test them against the hardest public benchmarks, and ship what works directly into client products.



Flagship

Benchmark-Breaking
Retrieval AI.

We built a question-answering system that outperforms every published alternative. When given a complex question that requires connecting information across multiple documents, our system finds the right answer more reliably than any other publicly documented approach.

On HotpotQA, a standard academic benchmark for multi-hop reasoning used across research labs and industry, our system achieved an F1 score of 86.8%.

Why this matters: Unlike most high-scoring systems, ours doesn't require training on the benchmark dataset. Fine-tuned models score higher on the specific data they're trained on, but can't generalise. Our system is training-free. It works out of the box on any domain. Finance documents, medical records, legal contracts. Same architecture, same accuracy. That's the difference between a research result and a production system.

When your AI gets the answer wrong, someone has to catch it. Higher accuracy means fewer errors, less human review, and more trust in the system. The gap between 79.5% and 86.8% isn't academic. It's the difference between a system that needs constant supervision and one that works.

HotpotQA · F1 Score (0–100%)

Our system
86.8%
StepChain GraphRAG Published best
79.5%
Standard RAG Baseline approach
72.0%

Public benchmark · Methodology and results published · Reproducible by any researcher.

Free Tools

Working tools, released publicly.

Tools we have built in The Foundry and released publicly. No signup, no credit card.

Pipeline

How research becomes product.

Everything we build in The Foundry eventually ships to clients. The pipeline is straightforward.

01

Research

Build new techniques from first principles

02

Benchmark

Test against public academic datasets

03

Harden

Production-grade reliability and speed

04

Deploy

Integrate into client systems

When we build a chatbot or knowledge system for your business, it's running the same architecture we've stress-tested against the hardest public benchmarks. You get research-grade AI without the research timeline.

See what this means for your business.

The same architecture that scored 86.8% on the world's hardest retrieval benchmark is what runs inside your system. Run the free analyser to see exactly where it applies.


BespokeWorks

Worked with us? We'd love your feedback.

Your experience helps other businesses make the right decision.

Leave a Review on Trustpilot
100%
Custom Built
Global
Clients Served
Free
AI Analysis
Analysis running

View Your Roadmap →