SP Agent Team Token Report — Week of 2026-04-26
• 1 分鐘閱讀 1 分
---
title: "Weekly Token Optimization Report: Routing Efficiency & Hermes Scaling"
date: "2026-04-27"
category: "Engineering"
---
## This Week's Numbers
Our token distribution for the week of April 20th to April 26th shows a concerning trend in model routing.
| Metric | Value | Trend | Status |
| :--- | :--- | :--- | :--- |
| **Opus Turns (Avg)** | 54% | $\uparrow$ 12% | 🔴 Red |
| **Target Opus%** | < 50% | — | — |
| **Hermes Sessions** | 127 | $\uparrow$ | 🟢 Healthy |
| **Total Hermes Tokens** | 44.4M | $\uparrow$ | 🟢 Healthy |
| **Hermes Cost** | $0.91 | $\downarrow$ | 🟢 Optimized |
## What Changed
The primary shift this week was a significant spike in **Claude 3 Opus** utilization, peaking on April 25th at **76%**. This represents a "worsening" of our routing efficiency, as we moved from 42% Opus usage last week to 54% this week.
While the total volume of turns remained steady across the week, the reliance on the most expensive model for reasoning tasks suggests that recent complex tickets are bypassing our Sonnet-first routing logic.
## Wins
The **Hermes Agent** (powered by `qwen3-235b`) has become a powerhouse for high-volume, low-complexity operations.
- **High Throughput**: Processed over **44 million tokens** across 127 sessions.
- **Tool Proficiency**: Hermes is successfully handling the "heavy lifting" of system interaction. The top tools—`terminal` (21.4%), `read_file` (19.3%), and `patch` (12.4%)—indicate that Hermes is effectively managing filesystem audits and codebase exploration.
- **Operational Stability**: Maintained an average session length of ~3 minutes with a high message density (13.1 msgs/session), proving it can handle iterative debugging loops without escalating to Claude.
## Challenges
The **🔴 Red Light** status on our Opus% metric is the critical blocker. An Opus percentage of 54% indicates that our "Complexity Gate" is too permissive. We are routing tasks to Opus that could likely be handled by Sonnet or Hermes, increasing both latency and cost per turn.
## Next Week's Target
Our objective for the coming week is a return to **Opus% < 45%**. We need to tighten the dispatch logic to ensure Opus is reserved strictly for architectural decision-making and high-level reasoning, rather than implementation.
## Dispatch Optimization
Based on this week's Hermes tool usage, we will shift the following tasks from Claude to Hermes:
1. **Filesystem Exploration**: All `search_files` and `read_file` sequences will be routed to Hermes by default.
2. **Boilerplate Patching**: Standard `patch` operations for repetitive code updates will move to Hermes.
3. **Environment Audits**: Terminal-based system checks (`terminal` calls) will be offloaded to the Qwen-based engine.
## Cost Savings
The efficiency of the Hermes agent is stark. By utilizing a high-capacity, low-cost model for the "grunt work" of agentic loops:
- **Hermes Actual Cost**: $0.91 for 1,666 messages.
- **Estimated Sonnet Cost**: For the same 44.4M tokens, assuming a blend of input/output costs (~$3/M input), the cost would have exceeded **$130.00**.
- **Total Estimated Savings**: **~$129.00 per week** by diverting these specific tool-heavy sessions.
## Recommendations
Based on current telemetry, the following actions are mandated for the next sprint:
1. **Route more implementation to Sonnet**: Since Opus% is 54% (>50%), we must adjust the dispatcher to force Sonnet for all `write_file` and `patch` tasks.
2. **Optimize routing hooks**: Review why April 25th saw a 76% Opus spike to identify if a specific project type is triggering an "Opus-only" flag.
3. **Expand Hermes Toolset**: Given the success of the `terminal` and `read_file` tools, we will migrate more "Discovery" phase tools to Hermes to further reduce the Claude context window load.