Agent Learning Daily Digest #43 — 2026-06-12

数据源：GitHub (128) + HN (89) + arXiv (46) = 294 条，FETCH ERROR 仅 1 条（Reddit r/MachineLearning 429）。通过 HN Algolia 补充 coding agent / Claude Code / agent harness 关键词查询。

今日高信号

1. Agent Harness 的定义与分类学

论文： *What makes a harness a harness: necessary and sufficient conditions for an agent harness* — 对 "agent harness" 一词给出严格的构成性定义（必要+充分条件），将其与 agent framework、SDK、IDE plugin、eval harness、orchestrator 区分开来。以 Claude Code、Codex CLI、Aider、Cline、OpenHands、SWE-agent 六个真实 harness 验证了定义的实用性。

URL: https://arxiv.org/abs/2606.10106v1

2. Claw-SWE-Bench：多语言 Agent Harness 评估基准

论文： *Claw-SWE-Bench* — 350 个实例、8 种语言、43 个仓库的多语言 SWE-bench 基准。核心发现：adapter 设计是决定性因素（minimal adapter 19.1% vs. full adapter 73.4% Pass@1，同一 GLM 5.1 backbone），模型选择 (29.4 pp) 和 harness 选择 (27.4 pp) 是同等重要的评估维度。

URL: https://arxiv.org/abs/2606.12344v1

3. Agent Harness 价值辩论：模型 > Harness

博文： *Does the Harness Matter?* (Agents' Last Exam) — 在 ALE 基准 (~150 tasks, 55 subfields) 上，模型选择跨 18.0 个百分点，而 harness 选择仅跨 5-6 个百分点。ALE-Claw（极简 computer-use harness）以 44% 更少 input tokens、41% 更低成本达到同等准确率。

URL: https://agents-last-exam.org/blogs/harness-matters
对比： Rajit Khanna 的博文 *Building Agents Without Harness Engineering* (HN 22pts) 提出创业公司应直接用 Hermes 等现有 harness，而非自建，因为 harness 特性正在快速商品化。
URL: https://rajitkhanna.com/agents/

4. PROJECTMEM：Coding Agent 的本地优先事件溯源记忆层

论文： *PROJECTMEM* — 开源的本地优先记忆层，将开发过程记录为 append-only event log，通过 MCP 投影为紧凑摘要。引入 "Memory-as-Governance"：在 agent 重复失败修复或编辑脆弱文件前发出警告。14 个 MCP tools + 19 个 CLI commands。

URL: https://arxiv.org/abs/2606.12329v1

5. Less Context, Better Agents：上下文工程实证

论文： *Less Context, Better Agents* — 在 Microsoft Dynamics 365 费用报销场景（MCP tools），保留最近 5 个 tool call + 摘要的方案达到 91.6% 完成率（全量历史仅 71%），tokens 从 ~1.48M 降至 ~553K，运行时间从 14.56h 降至 5.79h。

URL: https://arxiv.org/abs/2606.10209v1

6. Fenic：声明式 Context Engineering 框架

项目： typedef-ai/fenic — "Declarative context engineering for agents"，458 stars，Python，极其活跃（2h 前有 commit）。提供 semantic operator、execution guardrails、adaptive token estimation、frontier model catalog。

URL: https://github.com/typedef-ai/fenic

7. APPO：Agentic Procedural Policy Optimization

论文： *APPO* — 将 agentic RL 中的 branching 和 credit assignment 从粗粒度（tool call 边界）细化到 token 级决策点。Branching Score 结合 token uncertainty 与 policy likelihood gains。在 13 个 benchmark 上提升 ~4 points，同时保持高效 tool calls。

URL: https://arxiv.org/abs/2606.12384v1

8. Skill Rewriting 的成本感知视角

论文： *What Should a Skill Remember?* — 将 skill rewriting重新定义为成本感知的知识工程问题（而非 prompt 压缩）。不同 rewriting 策略（API/code anchoring, workflow guarding, rule/formula anchoring）适用于不同任务族。学习到的策略降低总成本 7%，agent-token 成本 6%，跨模型迁移约 14.7% 总成本降低。

URL: https://arxiv.org/abs/2606.09421v2

9. Xiaomi MiMo Code：Scaling to Long-Horizon Tasks

博文： 小米 MiMo 团队开源 MiMo Code（MIT license，基于 OpenCode）。核心方法：Max Mode（N=5 parallel sampling + judge）、Goal Mechanism（independent verifier 防止过早终止）、显式存储-检索记忆。SWE-Bench Pro 提升 10-20%，但 token 成本 4-5x。

URL: https://mimo.xiaomi.com/blog/mimo-code-long-horizon

10. Claude Fable 5 热议 + Claude Desktop Hyper-V 问题

Claude Fable 5: mid-tier results on coding tasks (HN 169pts) — EndorLabs 评测 Fable 5 在编码任务上表现中规中矩。

- URL: https://www.endorlabs.com/learn/claude-fable-5-mythos-grade-hype

Claude Desktop spawns 1.8 GB Hyper-V VM on every launch (HN 427pts) — 即使仅聊天也会启动 VM。

- URL: https://github.com/anthropics/claude-code/issues/29045

11. Agent 安全工具生态爆发

Agent Vault Proxy (AVP) — "agent can't leak a secret it never had"，just-in-time API key 注入。16 stars。

- URL: https://github.com/inflightsec/agent-vault-proxy

Agent-PD — "Police department for Claude Code agents"，日志 hook + CLI 审计 agent/subagent 行为，报告违规。13 stars。

- URL: https://github.com/varmabudharaju/agent-pd

Local Privacy Filter for Claude Code — 本地隐私过滤器。

- URL: https://github.com/outgate-ai/og-local

观察清单

oh-my-agent (1077 stars) — 跨 IDE 的 vendor-agnostic agent harness，2,384 commits，极其活跃。值得关注其 prompt manifest sync 和 eval harness 设计。

- URL: https://github.com/first-fluke/oh-my-agent

LangChain DeepAgentsJS (1327 stars) — "Batteries included agent harness"，TypeScript，monorepo。LangChain 对 agent harness 标准化的尝试。

- URL: https://github.com/langchain-ai/deepagentsjs

Claustrophobic — 多账号 Claude Code harness，自动切换到剩余容量最多的 subscription "room"。

- URL: https://claustrophobic.xyz

Cursor Developer Habits Report (Spring 2026) — 编码速度 YoY 翻倍，agent 生成代码 review 存活率高，input tokens 大幅增长，cache-read tokens 占比上升。

- URL: https://cursor.com/insights