SP Agent Team Token Report — Week of 2026-05-10
• 1 分鐘閱讀 1 分
---
title: "Weekly Token Optimization: Breaking the 50% Opus Threshold"
date: "2026-05-11"
category: "Engineering"
---
## This Week's Numbers
Our focus this week remained on the "Opus-to-Sonnet" ratio to ensure we aren't over-relying on the most expensive model for routine agentic tasks.
**Claude Token Metrics (May 04 – May 10):**
- **Average Opus Usage:** 46% (Down from 58% last week)
- **Weekly Trend:** $\downarrow$ 12%
- **Status:** 🟢 Green (Target: <50%)
The data shows significant volatility, ranging from 0% Opus usage on May 4th-5th to a peak of 67% on May 9th. However, the weekly average successfully dipped below our 50% threshold, marking a positive shift in our model routing efficiency.
## What Changed
The primary shift this week was the aggressive offloading of background tasks to our secondary agent, **Hermes**. By delegating high-volume, low-reasoning tasks (primarily via `cron` jobs), we reduced the need for Claude Opus to handle environment sanity checks and basic file operations.
## Wins
1. **Target Achieved**: For the first time this month, we have brought the Opus utilization rate under 50%.
2. **Hermes Scaling**: Hermes handled a massive volume of **20,145,300 tokens** across 71 sessions.
3. **Tool Efficiency**: Hermes demonstrated high utility in system-level operations, with the `terminal` tool accounting for 33.7% of its activity, preventing these "noisy" interactions from inflating our Claude token spend.
## Challenges
Despite the average improvement, **May 9th saw a spike (110 Opus turns)**. This indicates that while our baseline is improving, complex "blocker" tasks still trigger an immediate escalation to Opus. We need to investigate if some of these implementation tasks could have been handled by Sonnet with better prompting.
## Next Week's Target
- **Opus Threshold**: Maintain $\le 45\%$ average.
- **Volatility Reduction**: Limit single-day Opus spikes to $< 60\%$.
- **Hermes Expansion**: Increase `cli` usage for Hermes to diversify it beyond just `cron` tasks.
## Dispatch Optimization
Based on the tool usage patterns in Hermes (where `terminal`, `read_file`, and `skill_view` dominate), we will shift the following tasks from Claude to Hermes starting next week:
- **Log Scanning & Tailing**: All `read_file` operations for system logs.
- **Initial Env Discovery**: Moving all `skill_view` and `skills_list` calls to Hermes to build a context map before invoking Claude.
- **Routine Health Checks**: Shifting `mcp_ss3_cloud_ub_sre_status` calls entirely to the secondary agent.
## Cost Savings
By routing 20.1M tokens to Hermes (running `qwen3-235b` self-hosted/custom) instead of Claude Sonnet, the savings are substantial.
- **Hermes Actual Cost**: $1.36
- **Estimated Claude Sonnet Cost**: Assuming a blended rate of ~$3.00 per 1M input tokens, the same volume would have cost approximately **$60.00**.
- **Total Weekly Saving**: $\approx$ **$58.64** for this specific workload.
## Recommendations
Based on this week's telemetry:
1. **Maintain Routing**: Opus% is currently at 46% (under the 50% limit). Continue current routing logic but monitor the May 9th-style spikes.
2. **Optimize Hermes**: With 71 sessions already active, Hermes is well-utilized. Focus now on moving "Research" and "Discovery" tasks to the secondary agent.
3. **Review-Loop Enforcement**: We recommend auditing `evaluate.sh` runs against total artifact creation to ensure our review rate remains high as we scale Hermes' autonomy.