UK AISI Publishes Inaugural Frontier AI Trends Report
Summary
UK AISI published its inaugural Frontier AI Trends Report, covering evaluations of more than 30 frontier models and documenting dramatic capability increases across cyber, biological, self-replication, and autonomous task domains — providing the most comprehensive external benchmark of frontier risk-relevant capabilities to date.
What Happened
The report synthesized evaluations across four primary risk-relevant capability areas conducted over the course of 2025. On apprentice-level cyber capabilities — defined as the ability to provide meaningful operational assistance to someone with limited prior cybersecurity knowledge — performance across evaluated models reached approximately 50%, up from roughly 10% at the start of the tracked period. Self-replication capabilities, tracking the ability of models to copy their own weights or spawn derivative instances in agentic contexts, grew from approximately 5% to 60% of frontier models showing at least partial capability between 2023 and the end of 2025. Biological uplift assessments found up to fivefold assistance to novice actors seeking knowledge relevant to creating biological hazards. Autonomous software engineering tasks requiring more than one hour of sustained independent work saw approximately 40% success rates across leading models.
The report was presented as an annual series, establishing AISI as a longitudinal tracker of frontier capability trends rather than a one-time evaluator.
Why It Matters
The Frontier AI Trends Report established a new class of external capability evidence with a distinct character from either lab self-assessments or academic benchmarks. Because AISI evaluated models from multiple labs against consistent metrics, the results could be compared across organizations in a way that individual system cards or safety reports could not support. The self-replication trajectory — from 5% to 60% of models over two years — was the most striking single data point, representing a capability area that theoretical AI safety work had long treated as a distant concern and that the report positioned as a near-term operational question. The report also provided independent empirical grounding for the RSP and FSF threshold debates underway at each major lab: rather than labs setting thresholds and then assessing their own models against them, AISI's data provided a third-party view of where the frontier actually sat.