Hack The Box has published results from a large benchmark comparing agentic AI-assisted teams with human-only teams on practical cybersecurity challenges. The study found sizeable productivity gains for experienced groups, but weaker outcomes for less-skilled teams.
The data comes from NeuroGrid, a three-day Capture The Flag competition featuring 36 challenges across nine technical domains. Hack The Box analysed results from 1,078 teams: 120 using agentic AI and 958 human-only teams, across four difficulty levels.
Across all participants, AI-assisted teams completed up to 1.4 times more tasks in the same time window. Among elite teams, output rose as high as 4.1 times.
AI-assisted teams also improved their solve rate within the competition window. The benchmark recorded a 27% solve rate for top AI-augmented teams versus 16% for top human-only teams. Across all active participants, AI-assisted teams achieved a 3.2-times higher solve-rate ratio than human-only teams.
Hack The Box said the results suggest AI can increase speed and throughput when teams apply it with strong oversight, and that it does not remove the need for experienced security practitioners.
"AI can raise the bar of cybersecurity performance, but it does not eliminate the need for human expertise," said Haris Pylarinos, founder and CEO of Hack The Box.
"Our findings show measurable productivity gains, but also predictable failure patterns. Security leaders must build and test human-in-the-loop workflows that are proven under pressure, and develop the AI and cybersecurity skills needed to unlock benefits safely as models evolve," Pylarinos said.
Experience divide
Breaking out performance by experience level, the benchmark found AI's impact varied sharply between early-career, mid-tier and elite operators.
For lower-ranked teams, Hack The Box reported what it called an "early career productivity illusion." AI sometimes acted as a bridge that helped weaker teams solve more challenges. However, lower-performing AI-augmented teams were 12.5% slower than comparable human-only teams, often getting trapped in unproductive loops-an outcome the report linked to limited oversight and limited tool fluency.
Mid-level participants showed the strongest gains on medium-difficulty tasks. The report said the AI advantage peaked at 3.89 times in that band, which it described as a sweet spot for pattern recognition and faster progress.
At the top end of the leaderboard, the benchmark suggested AI narrowed the performance gap rather than rewriting it. Hack The Box said the solve-rate advantage fell from 3.2 times overall to 1.7 times among the top 5% of teams. It also reported that AI-augmented elite teams completed challenges 312% faster, positioning AI as a speed multiplier rather than a substitute for expertise.
Talent pipeline
The report also highlighted a workforce risk that could follow rapid AI adoption in security operations. Hack The Box said the biggest boost appeared in medium-complexity work, where many analysts build judgement and practise investigative workflows. Heavy automation at this layer, it argued, could reduce the day-to-day work that typically develops mid-level practitioners.
That dynamic matters for employers seeking a pipeline of analysts who can handle unusual incidents and high-stakes decisions. The report described the "hardest and most novel" tasks as areas where teams still need human judgement and verification, even when AI improves baseline performance.
Gibb Witham, president of Hack The Box, said organisations should balance productivity gains with training plans that keep human judgement in the loop.
"Routine and mid-level work is where enterprises will see immediate ROI," Witham said.
"If organizations over-index on automating the tasks that build judgment, they risk trading long-term resilience for short-term efficiency. Agentic automation must be paired with deliberate human skill development. For enterprises, the competitive advantage will not come from AI adoption alone. It will come from training cybersecurity professionals to effectively orchestrate, validate, and govern AI-driven workflows and agents," he said.
Operational implications
Agentic AI has become a focal point for security teams because it can chain actions together rather than simply answer prompts. In the benchmark, teams used it in timed problem-solving across multiple security domains, creating comparable performance signals across different skill tiers.
Hack The Box said the results support a hybrid operating model. It positioned human-in-the-loop workflows as a way to manage predictable failure patterns while retaining the speed benefits seen in higher-performing teams.
Hack The Box plans to present a deeper analysis of the benchmark findings at RSAC 2026 in a Village showcase.