R&D · Applied Research · Theo Coleman
  

The Foundry

Where we build what doesn’t exist yet.

BespokeWorks' research and development arm. We build AI systems from first principles, test them against the hardest public benchmarks, and ship what works directly into client products.

See the flagship result Browse the case studies

Flagship

Benchmark-Breaking
Retrieval AI.

We built a question-answering system that outperforms every published alternative. When given a complex question that requires connecting information across multiple documents, our system finds the right answer more reliably than any other publicly documented approach.

On HotpotQA, a standard academic benchmark for multi-hop reasoning used across research labs and industry, our system achieved an F1 score of 86.8%.

Why this matters: Unlike most high-scoring systems, ours doesn't require training on the benchmark dataset. Fine-tuned models score higher on the specific data they're trained on, but can't generalise. Our system is training-free. It works out of the box on any domain. Finance documents, medical records, legal contracts. Same architecture, same accuracy. That's the difference between a research result and a production system.

When your AI gets the answer wrong, someone has to catch it. Higher accuracy means fewer errors, less human review, and more trust in the system. The gap between 79.5% and 86.8% isn't academic. It's the difference between a system that needs constant supervision and one that works.

HotpotQA · F1 Score (0–100%)

Our system

86.8%

StepChain GraphRAG Published best

79.5%

Standard RAG Baseline approach

72.0%

Public benchmark · Methodology and results published · Reproducible by any researcher.

From The Foundry

Case studies

Beating Every Published Benchmark for Multi-Hop QA

Our training-free RAG system achieved F1 86.8% on HotpotQA, outperforming StepChain GraphRAG (79.5%) and every other published result.

Read the case study →

Inside Our Free AI Business Analyser

How our free instant business analyser audits any company website in about five minutes and returns a personalised 3-phase AI automation roadmap.

Read the case study →

A Blog Generator That Outscored Opus 4.6 and the Internet's Top Writers

Four consecutive production posts averaged 81.8 / 100 on our open 8-category benchmark. Raw Claude Opus 4.6 scored 65. GPT-4o scored 55.

Read the case study →

Generating Full MRI Scans from Partial Data

Using diffusion models to help clinicians work with complete imaging when only fragments are available.

Case study coming soon

Free Tools

Working tools, released publicly.

Tools we have built in The Foundry and released publicly. No signup, no credit card.

Free AI Business Analyser

Enter any website URL and get a personalised 3-phase AI automation roadmap in about five minutes.

Launch the analyser →

Business Analysis Cost Calculator

Interactive calculator estimating what a full business analysis would cost if you hired a human consultant. Transparent formula, live stage breakdown.

Open the calculator →

Coming soon

More tools in the pipeline.

We publish new free tools as we build them. If there is a specific piece of analysis you wish was automated, let us know.

Tell us what you need →

Pipeline

How research becomes product.

Everything we build in The Foundry eventually ships to clients. The pipeline is straightforward.

01

Research

Build new techniques from first principles

02

Benchmark

Test against public academic datasets

03

Harden

Production-grade reliability and speed

04

Deploy

Integrate into client systems

When we build a chatbot or knowledge system for your business, it's running the same architecture we've stress-tested against the hardest public benchmarks. You get research-grade AI without the research timeline.

Behind the work

The team and the research that drive The Foundry.

About

See what this means for your business.

The same architecture that scored 86.8% on the world's hardest retrieval benchmark is what runs inside your system. Run the free analyser to see exactly where it applies.

Get my free AI report Talk to the team

Where we build what doesn’t exist yet.

Benchmark-Breaking
Retrieval AI.

Case studies

Beating Every Published Benchmark for Multi-Hop QA

Inside Our Free AI Business Analyser

A Blog Generator That Outscored Opus 4.6 and the Internet's Top Writers

Generating Full MRI Scans from Partial Data

Working tools, released publicly.

Free AI Business Analyser

Business Analysis Cost Calculator

More tools in the pipeline.

How research becomes product.

Research

Benchmark

Harden

Deploy

The team and the research that drive The Foundry.

Meet the team

Research → product

Work with us

See what this means for your business.

Worked with us? We'd love your feedback.

Where we build what doesn’t exist yet.

Benchmark-Breaking Retrieval AI.

Case studies

Beating Every Published Benchmark for Multi-Hop QA

Inside Our Free AI Business Analyser

A Blog Generator That Outscored Opus 4.6 and the Internet's Top Writers

Generating Full MRI Scans from Partial Data

Working tools, released publicly.

Free AI Business Analyser

Business Analysis Cost Calculator

More tools in the pipeline.

How research becomes product.

Research

Benchmark

Harden

Deploy

The team and the research that drive The Foundry.

Meet the team

Research → product

Work with us

See what this means for your business.

Worked with us? We'd love your feedback.

Benchmark-Breaking
Retrieval AI.