Roles
Location
London
Work setup
- Employment
- full-time
- Level
- Senior
- Remote policy
- ONSITE / HYBRID
- Remote scope
- hybrid
Application
- Portfolio
- unclear
- GitHub
- unclear
- Cover letter
- unclear
- Apply flow
- ats
Company context
- Product
- evaluation datasets and RL environments
- Industry
- Artificial intelligence
- HQ
- London
- Size
- small
Description
We build the evaluation datasets and RL environments that make AI reliable where mistakes are expensive: finance, healthcare, and legal. We design expert-curated training data, calibrated rubrics, and RL environments for frontier AI labs and startups pushing the frontier of what models can do. We’re a small London-based team running multiple active projects, so what you ship gets used immediately by labs, startups and internal domain experts. We recently launched SpreadsheetBench v2 and work on ultra long-horizon tasks. We’re hiring: - MTS, Applied AI — design the next benchmarks and RL environments - MTS, SWE / Product — build the platform behind every dataset
Similar jobs
-
Loading similar jobs...