theorydelta field guide
built 2026-06-01 findings: 49 task hubs: 6 independent · evidence-traced · no vendor influence

FIELD GUIDE / FINDINGS

49 findings — where claim and behavior diverge.

Filter by what you're about to do. Task tabs use each finding's real task-hub mapping; findings that are not mapped yet stay visible under All.

I'M ABOUT TO…

28 published findings not yet mapped to a task hub.

EVIDENCE TYPE
tested source-reviewed independently-confirmed docs-reviewed
sorted by newest
id finding task tool evidence verified
0049 Hermes Agent's self-improvement narrative is not supported by the current codebase Not yet mapped Hermes Agent (NousResearch medium 2026-05-31 0048 Claude Code Model Aliases Silently Hardcode Instead of Tracking Latest Not yet mapped Claude Code (Anthropic) empirical 2026-05-25 0047 LangGraph checkpoint round-trips silently drop typed state — four confirmed failure modes Not yet mapped langchain-ai empirical 2026-05-24 0046 Pipecat's interrupt handling drops function call results and deadlocks — two independent mechanisms Not yet mapped Pipecat (pipecat-ai empirical 2026-05-22 0045 Cursor Automations Introduces Always-On Agents With No Execution Visibility Not yet mapped Cursor (cursor.com) empirical 2026-05-17 0044 "Production-ready" agents have no canonical definition — most deployments pass 4-6 of 9 operational gates Not yet mapped LangChain State of Agent E secondary-research 2026-05-19 0043 SWE-bench Verified abandoned after audit found 59% test flaws and training data contamination Not yet mapped SWE-bench Verified (OpenAI empirical 2026-05-19 0042 Nine confirmed data exfiltration paths against LLM agents bypass output-layer guardrails by design Not yet mapped Back-Reveal (arXiv:2604.05 empirical 2026-05-17 0041 CLAUDE.md and .cursorrules are not equivalent governance files — switching AI coding tools without auditing both leaves agents with no constraint on architecture decisions Not yet mapped GitHub public repository c empirical 2026-04-01 0040 Agent supply chain attacks use vectors that CVE scanners cannot detect Not yet mapped OpenClaw empirical 2026-05-15 0039 AI code review tool detection rates vary by an order of magnitude — architecture determines the ceiling, not model quality Not yet mapped Greptile (greptile.com) empirical 2026-05-06 0038 Hierarchical agent teams work at depth 2, but only if you compress context at every boundary Not yet mapped LangGraph (langchain-ai) empirical 2026-05-14 0037 AutoGen's two MCP integration paths both have blocking failures, and the framework is in maintenance mode Not yet mapped microsoft empirical 2026-04-26 0036 Tool-poisoning attacks against MCP agents succeed more than one-third of the time and the stealth class is undetectable by production tooling Not yet mapped MCPTox benchmark (arXiv:25 empirical 2026-04-19 0035 FastMCP's from_openapi() Path Leaked Auth Headers Twice and Corrupted GET Requests Not yet mapped jlowin empirical 2026-04-03 0034 Instruction-based agent constraints are probabilistic — safety training predicts worse compliance, not better Not yet mapped Replit (AI coding agent) empirical 2026-05-15 0033 Apple container's default runtime leaves network fully open — and --cap-drop ALL doesn't help Not yet mapped apple empirical 2026-05-06 0032 Cline's `.clinerules` Has No Enforcement — The Model Decides What to Obey Not yet mapped Cline (cline empirical 2026-05-06 0031 n8n's DAG execution model breaks agent loops — tool errors crash workflows and tool results disappear Not yet mapped n8n (n8n-io empirical 2026-05-03 0030 Claude Code MCP Bridge Silently Coerces All Non-String Parameters to Strings Not yet mapped Anthropic empirical 2026-05-03 0029 CrewAI closed tool fabrication, broken delegation, and SQL injection as not-planned Not yet mapped CrewAI (crewAIInc empirical 2026-05-03 0028 Claude Code deny rules do not enforce across MCP tools, subagents, or Bash glob — and the gap is unfixed by design Not yet mapped anthropics empirical 2026-04-27 0027 Ollama disables tool calling silently in two independent ways by default Not yet mapped Ollama (ollama empirical 2026-04-29 0026 A2A has no standard agent registry API, and AAIF hasn't closed the gap in 3 months Not yet mapped a2aproject empirical 2026-03-26 0025 Aider bypasses git pre-commit hooks and auto-commits AI-generated code by default Not yet mapped Aider-AI empirical 2026-04-03 0024 GitHub Copilot coding agent attack surface has four independent layers — three RCE chains confirmed, two vectors permanently unpatched Not yet mapped GitHub Copilot coding agen empirical 2026-05-15 0023 You picked vector vs graph for agent memory — the empirical answer is neither, pick compression Build RAG mem0ai secondary-research 2026-04-27 0022 MCP tool schemas written for Claude silently fail against Fireworks AI, Gemini, and OpenAI Not yet mapped modelcontextprotocol empirical 2026-04-25 0021 MCP write tools gated on Authorization header are Claude Code-only — other clients silently fail Set up MCP Claude Code validated 2026-04-22 0020 Structured generation guarantees break silently above undocumented complexity thresholds — no provider publishes where the line is Build RAG Anthropic Claude API (Anth secondary-research 2026-04-21 0019 playwright-mcp works in local testing and silently destroys sessions in cloud deployment Set up MCP microsoft empirical 2026-04-19 0018 Your agent framework choice locks you into undocumented failure modes the docs won't mention Pick a framework CrewAI empirical 2026-04-20 0017 One agent security tool leaves you entirely blind to the threats only a different tier can see Not yet mapped mcp-scan (Invariant medium 2026-04-20 0016 Error suppression patterns common in scripts will silently break your agent Configure autonomy Claude Code (Anthropic) medium 2026-03-28 0015 ChromaDB will run out of RAM before you think, and v0.5.x silently orphaned your embeddings Build RAG ChromaDB (chroma-core) medium 2026-03-29 0014 LangGraph checkpoints silently corrupt non-primitive types — your resume will not restore what you saved Pick a framework LangGraph (langchain-ai) empirical 2026-03-01 0013 The benchmark you used to evaluate your agent was retired — 59% of its test cases were wrong Evaluate a benchmark SWE-bench Verified (OpenAI secondary-research 2026-04-18 0012 Goose ships with compounding security defaults that together equal dangerouslySkipPermissions Configure autonomy Goose Desktop validated 2026-04-19 0011 Enabling streaming in the OpenAI Agents SDK means your guardrails no longer block anything Pick a framework openai-agents-python medium 2026-03-17 0010 Cloning a repo is enough — Claude Code's project settings file executes before you trust it Configure autonomy Claude Code empirical 2026-04-19 0009 Claude Code hooks fire unreliably — no single hook is a security enforcement point Pick a framework · Configure autonomy Claude Code empirical 2026-04-19 0008 Your LLM gateway will silently overspend your budget and bypass your guardrails Choose a gateway LiteLLM empirical 2026-03-22 0007 Three RAG pipeline failures your framework won't tell you about Build RAG microsoft empirical 2026-02-22 0006 DeepEval silently exfiltrated trace data on import — and Langfuse silently drops your orchestration spans Configure autonomy confident-ai medium 2026-04-19 0005 Self-hosted graph memory will crash your async service or corrupt your files — and the docs don't mention either Build RAG mem0ai empirical 2026-04-20 0004 Your agent CI gate is probabilistic — and your VCR recording does not cover MCP tool calls Evaluate a benchmark confident-ai independently-confirmed 2026-04-19 0003 Two enterprise acquisitions confirmed MCP supply chain risk — but one attack class still has no defense Set up MCP invariantlabs-ai independently-confirmed 2026-04-19 0002 MCP database servers' "read-only mode" is a string check — and it's bypassable Set up MCP executeautomation independently-confirmed 2026-04-19 0001 Deploying your MCP server stateless will silently break sampling and elicitation Set up MCP MCP TypeScript SDK independently-confirmed 2026-04-19
showing 49 of 49 filter: All search: all findings sorted by newest
theorydelta.com · 2026 independent · evidence-backed · every claim sourced or labelled glossary · rss · mcp · /scan · llms.txt