跳至主要內容

SP Agent Team Token Report — Week of 2026-04-19

1 分
---
title: "Weekly Token Optimization Report: Maintaining the 50% Opus Threshold"
date: "2026-04-20"
category: "Engineering"
tags: ["LLM-Ops", "Claude", "Token-Optimization", "Agentic-Systems"]
---

## This Week's Numbers

Our primary objective for the SuperPortia agentic loop remains the reduction of high-cost model dependency. This week, we maintained a **stable Opus utilization rate of 42%**, a slight improvement from last week's 43%. 

**Claude Dispatch Metrics:**
- **Average Opus%**: 42% (Target: <50%)
- **Status**: 🟢 Green
- **Total Turns**: ~3,400 (aggregated across the week)
- **Peak Volume**: April 15th saw the highest activity with 1,312 total turns, where Opus usage spiked to 52%.

**Hermes Agent (Secondary) Metrics:**
- **Total Tokens**: 100,433,234
- **Total Sessions**: 246
- **Total Cost**: $4.84
- **Primary Driver**: `qwen3-235b-a22b-2507` (56.1M tokens)

## What Changed

The token distribution showed significant volatility mid-week. We observed a sharp climb in Opus usage from April 13th (30%) to April 15th (52%), coinciding with a surge in session complexity. However, we successfully corrected this trend by April 18th, dropping back to 32%. This suggests that our routing logic is effectively shifting back to Sonnet once the "heavy lifting" of architectural planning (Opus) is complete.

## Wins

1. **Threshold Stability**: Despite high-intensity sessions, we stayed under the 50% Opus ceiling for the weekly average.
2. **Hermes Efficiency**: The Hermes agent successfully offloaded over 100M tokens of context, primarily utilizing the `terminal` (51.8% of calls) and `read_file` (14.8% of calls) tools.
3. **Cache Performance**: High cache read rates (45.4M tokens) on Hermes are significantly reducing latency and redundant processing.

## Challenges

The volatility on April 15th indicates that certain "complex" tasks are still triggering Opus too frequently. We are seeing a pattern where "read-explore" phases are still leaning on high-reasoning models when a more specialized, smaller model could suffice for basic file system navigation.

## Next Week's Target

- **Opus% Goal**: <40%
- **Objective**: Shift "System Exploration" and "Log Analysis" entirely to Hermes.

## Dispatch Optimization

Based on the tool call data, Hermes is already dominating `terminal` and `read_file` operations. To further optimize, we will shift the following from Claude to Hermes next week:
- **Initial Codebase Indexing**: Move all `search_files` and `session_search` calls to Hermes.
- **Routine Linting/Testing**: Any task triggering `execute_code` for syntax validation will be routed to Hermes.
- **Log Monitoring**: All `cron` based log summaries will bypass Claude entirely.

## Cost Savings

The financial impact of using Hermes as a secondary agent is substantial. 
- **Hermes Actual Cost**: $4.84 for ~100M tokens.
- **Claude Sonnet Estimation**: Processing 54M input tokens through Sonnet 3.5 (at approx. $3/1M tokens) would have cost roughly **$162.00**.
- **Net Savings**: By routing system-level tool calls to Hermes, we saved approximately **$157.16** this week alone.

## Recommendations

Based on this week's telemetry, the following actions are mandated for the next sprint:

1. **Use free engines more**: Gemini and Gemma usage represents <1% of research tokens. We need to route initial "broad-sweep" research tasks to these free engines to further lower the baseline cost.
2. **Review rate too low**: With only 2 updated files against 2,579 skipped files in the project scan, our `evaluate.sh` enforcement hooks are likely too permissive. We need to tighten the review trigger.
3. **Optimize High-Volume Days**: Analyze the April 15th spike to identify if a specific agentic loop caused the Opus% increase and implement a "circuit breaker" to force Sonnet routing during high-turn bursts.