SP Agent Team Token Report — Week of 2026-04-12
• 1 分鐘閱讀 1 分
---
title: "Weekly Token Optimization Report: Driving Opus% Down and Hermes Up"
date: "2026-04-13"
category: "Engineering"
tags: ["LLM-Ops", "Claude", "Hermes", "Token-Optimization"]
---
## This Week's Numbers
Our focus this week remained on the "Opus% metric"—the ratio of high-cost Claude Opus turns relative to Claude Sonnet turns. We are aggressively pushing for a target of **<50%** to balance cognitive power with operational cost.
| Metric | Value | Trend | Status |
| :--- | :--- | :--- | :--- |
| **Weekly Avg Opus%** | **38%** | ↓ 4% (from 42%) | 🟢 Green |
| **Total Opus Turns** | 2,750 | Stable | — |
| **Total Sonnet Turns** | 4,773 | Increasing | — |
| **Hermes Total Tokens** | 9.6M | High Volume | — |
## What Changed
We observed a volatile but generally improving trend. While we hit a low of **22% Opus usage on April 7th**, we saw a significant spike to **83% on April 12th**. This spike suggests a complex architectural problem was encountered on Sunday that required Opus's superior reasoning. However, the overall weekly average of 38% indicates that our dispatch logic is successfully routing standard implementation tasks to Sonnet.
## Wins
1. **Target Achievement**: We are comfortably below our <50% ceiling.
2. **Hermes Efficiency**: The Hermes Agent is demonstrating incredible efficiency via caching. Out of 9.6M total tokens, **8.9M (92.7%) were cache reads**, drastically reducing latency and cost.
3. **Tool Integration**: Hermes has become the primary driver for environment interaction, with the `terminal` (24%) and `browser_snapshot` (12%) being the most utilized tools.
## Challenges
The primary challenge is the "Sunday Spike." The jump to 83% Opus usage indicates a failure in our routing logic or a task complexity that Sonnet simply couldn't handle. We need to analyze the logs for April 12th to determine if those Opus turns were necessary or if they resulted from "retry loops" where Sonnet failed and the system defaulted to Opus.
## Next Week's Target
* **Opus% Target**: Maintain avg < 40%.
* **Hermes Scaling**: Increase Gemini-2.5-pro utilization for research-heavy sessions to further lower the cost floor.
## Dispatch Optimization
Based on Hermes' success with tool-heavy tasks (terminal, browser navigation), we will shift the following from Claude to Hermes next week:
- **Initial Environment Scanning**: All `read_file` and `search_files` operations.
- **UI Testing/Validation**: Shifting all `browser_click` and `browser_type` sequences to Hermes.
- **Basic Log Analysis**: Moving repetitive terminal output parsing to DeepSeek-chat.
## Cost Savings
By leveraging Hermes (powered primarily by DeepSeek-chat) for 614 messages this week, the cost impact was negligible.
- **Hermes Actual Cost**: ~614 queries $\times$ \$0.003 $\approx$ **\$1.84**
- **Estimated Sonnet Cost**: For the same 9.6M tokens (approx. 600k input / 83k output), the cost would have been roughly **\$3.00 - \$5.00** depending on prompt caching.
- **The "Hidden" Win**: The 8.9M cache read tokens on Hermes represent a massive saving in compute time and potential cost that would have been incurred if these were fresh prompts sent to a proprietary API.
## Recommendations
Based on this week's data, the following actions are triggered:
1. **Use free engines more**: Gemini-2.5-pro usage currently accounts for only ~25% of Hermes sessions (8/32). We need to route more high-context research tasks to the free tier to optimize the budget.
2. **Analyze Sunday's Peak**: Review the sessions from April 12th to see if the 83% Opus usage was a "necessity" or a "routing failure."
3. **Expand Hermes Toolset**: Given the high usage of `terminal` and `browser` tools, we should implement more specialized "skill" shortcuts for Hermes to reduce token overhead per tool call.