Applied Research · The Foundry

Research that ships.

What we’re testing inside The Foundry, the open problems we think matter, the techniques we’re benchmarking, and what graduates into client systems.

Active research areas

What we’re working on right now.

  • Retrieval

    Training-free multi-hop QA

    How to score 86%+ on HotpotQA without fine-tuning. So the same architecture works in any domain. Currently flagship.

  • Agents

    Tool-use reliability

    Why agents fail when chains exceed 5–7 tool calls, and what makes some agent loops stable past 50.

  • Evaluation

    Production-faithful evals

    Replacing benchmark scores with eval suites that predict deployment behaviour. Most ML evals don't.

  • UX

    Confidence surfacing

    When the model is uncertain, how do we show it to the user without breaking flow? Especially for high-stakes decisions.

  • Cost

    Model routing economics

    Tiered routing (small model → big model → human) is obvious. Building the trust signals that make routing safe is not.

  • Compliance

    Audit-grade logging

    Every action emits a structured log. Tamper-evident, replay-able. What does this look like at scale?

From research to product

How a paper becomes a deployed system.

  1. Step 01

    Hypothesis

    Started by a real client problem we couldn't solve cleanly with existing tools.

  2. Step 02

    Bench + benchmark

    Build the smallest experiment. Run it against the hardest public benchmark we can find.

  3. Step 03

    Harden

    Eval suite → load test → adversarial test. Most ideas die here. The survivors graduate.

  4. Step 04

    Deploy

    Goes into a client system as Bespoke Build, instrumented to feed back signal for the next iteration.

See the flagship benchmark.

Our HotpotQA result is our flagship case study, F1 86.8%, training-free, runs out of the box on any domain.


BespokeWorks

Worked with us? We'd love your feedback.

Your experience helps other businesses make the right decision.

Leave a Review on Trustpilot
100%
Custom Built
Global
Clients Served
Free
AI Analysis
Analysis running

View Your Roadmap →