SP Agent Team Token Report — Week of 2026-05-03
• 1 分鐘閱讀 1 分
---
title: "Weekly Token Optimization Report: May 03, 2026"
date: "2026-05-03"
category: "Engineering"
tags: ["LLMOps", "TokenOptimization", "Claude", "Hermes"]
---
## This Week's Numbers
The primary KPI for our agentic orchestration this week—the **Opus% ratio**—has trended in the wrong direction. We are currently seeing a "Red Light" status as we exceed our efficiency targets.
| Metric | Value | Status |
| :--- | :--- | :--- |
| **Opus Usage %** | 58% | 🔴 (Target < 50%) |
| **Weekly Trend** | ↑ 5% | ⚠️ Worsening |
| **Hermes Total Tokens** | 36.54M | 🟢 Stable |
| **Hermes Cost** | $1.09 | 🟢 Optimized |
| **Total Hermes Sessions** | 99 | 🟢 Active |
## What Changed
We observed a significant spike in Opus reliance between April 28th and 29th, where the Opus% hit **71% and 74%** respectively. While usage dropped sharply on May 2nd (0% Opus), the mid-week surge pushed our weekly average to 58%. This suggests that complex architectural tasks or debugging sessions during the mid-week peak were routed to Opus rather than being decomposed for Sonnet.
## Wins
The **Hermes Agent** continues to be our primary engine for high-volume, low-complexity operations. Handling **36.5 million tokens** across 99 sessions for just **$1.09** demonstrates the viability of our tiered model strategy. Specifically, the `cron` platform is successfully offloading 90% of the agent's session load, ensuring that routine system maintenance doesn't bleed into our Claude token budget.
## Challenges
The "Red Light" status on Opus usage indicates a failure in our dispatching logic. We are over-relying on the most expensive model for turns that could likely be handled by Sonnet. With 1,523 messages processed by Hermes, it's clear we have the infrastructure to offload more, but we aren't yet routing "middle-tier" complexity tasks away from Opus.
## Next Week's Target
**Goal: Opus% < 50%**
We need to aggressively shift the dispatching threshold. Any task that does not require deep multi-step reasoning or high-precision creative synthesis must be routed to Sonnet or Hermes.
## Dispatch Optimization
To correct the current trend, we will shift the following task categories from Claude to Hermes:
- **Codebase Exploration**: Hermes is already dominating `read_file` (15.9%) and `search_files` (11.5%). All initial "grep-and-find" missions will now be Hermes-exclusive.
- **System Health Checks**: The `mcp_ss3_cloud_ub_check_agent_mailbox` and related cloud checks should be fully decoupled from Claude's supervision.
- **Routine Terminal Execution**: With 37% of Hermes' calls being `terminal` operations, all non-destructive shell commands will be routed to Hermes.
## Cost Savings
By utilizing Hermes (powered by qwen3-235b) instead of Claude 3.5 Sonnet for these 1,523 messages:
- **Hermes Actual Cost**: ~$1.09
- **Estimated Sonnet Cost**: Based on 36.5M tokens (mostly input), using a blended rate of ~$3.00/1M tokens, the cost would have been approximately **$109.50**.
- **Net Savings**: **~$108.41 per week** on this secondary agent alone.
## Recommendations
Based on this week's data, the following actions are mandated for the next sprint:
1. **Route more implementation to Sonnet**: Since Opus% is 58% (>50%), we must refine the dispatcher to prioritize Sonnet for standard feature implementation.
2. **Audit "Complex" Labels**: Review the turns from April 28-29 to determine why those specific sessions triggered Opus and if they could have been handled by a Sonnet-based chain.
3. **Expand Hermes Toolset**: Given the high success rate of `terminal` and `read_file` calls, we will integrate more read-only system tools into Hermes to further reduce Claude's context window load.