EVIDENCE LAYER / FINDINGS

The evidence layer

These are the findings dossier ledger rows cite — the evidence behind each claim-fidelity call, not headline content in their own right. Filter by what you're about to do, or browse by task →.

RSS llms.txt Query via MCP

I'M ABOUT TO…

All61 Set up MCP5 Choose a gateway1 Pick a framework4 Build RAG5 Configure autonomy5 Evaluate a benchmark2 Not yet mapped40

40 published findings not yet mapped to a task hub.

EVIDENCE TYPE

testedsource-reviewedindependently-confirmeddocs-reviewed

sorted by newest

idfindingtasktoolevidenceverified

0061Claude Code auto-memory silently truncates at 200 lines — topic files Claude creates never auto-loadNot yet mappedClaude Code (Anthropic)empirical2026-07-07 0060Claude Code web and CLI have different trust models -- git push is branch-locked, machine memory is dropped, and headless CI needs two mechanismsNot yet mappedClaude Codeempirical2026-06-27 0059Codex's approval policy doesn't hold across runtimes -- VS Code ignores it, Windows inverts it, and CI auto-approves any mid-session escalation since v0.113.0Not yet mappedOpenAI Codex CLIempirical2026-06-27 0058Claude Desktop and Claude Code send the same clientInfo.name — MCP servers cannot tell them apartNot yet mappedClaude Desktop (Anthropic)empirical2026-05-07 0057LiteLLM's supply chain was compromised and budget enforcement fails silently under concurrent loadNot yet mappedBerriAIempirical2026-07-03 0056MCP stateful sessions fail with every free-tier load balancer — and neither major client recovers automaticallyNot yet mappedMCP Python SDKempirical2026-05-17 0055Multi-Agent OAuth Delegation Has No Enforcement Layer — RFC 8693 'act' Claims Are Advisory OnlyNot yet mappedOAuth RFC 8693 (IETF)medium2026-05-31 0054A2A Agent Card Skill Descriptions Are an Unprotected Injection Surface — 100% Exfiltration in Tested ScenariosNot yet mappedA2A Protocol Specempirical2026-06-12 0053LocalAGI's 50% Tool-Call Failure Rate Is an Infrastructure Bug, Not a Model ProblemNot yet mappedLocalAI (mudler/LocalAI)empirical2026-05-09 0052Worktrees are not required for parallel Claude Code agents under active human steeringNot yet mappedClaude Code (Anthropic)empirical2026-05-24 0051Agent Config Dependencies Silently Cause Hallucination, Not ErrorsNot yet mappedClaude Code (Anthropic)empirical2026-06-01 0050Hermes Agent's self-improvement narrative is not supported by the current codebaseNot yet mappedHermes Agentmedium2026-05-31 0049PickleScan's three CVSS 9.3 zero-days make static model weight scanning an unreliable production controlNot yet mappedPickleScan (mmaitre314)empirical2026-05-24 0048Claude Code Model Aliases Silently Hardcode Instead of Tracking LatestNot yet mappedClaude Code (Anthropic)empirical2026-05-25 0047LangGraph checkpoint round-trips silently drop typed state — four confirmed failure modesNot yet mappedlangchain-aiempirical2026-05-24 0046Pipecat's interrupt handling drops function call results and deadlocks — two independent mechanismsNot yet mappedPipecatempirical2026-05-22 0045Cursor Automations Introduces Always-On Agents With No Execution VisibilityNot yet mappedCursor (cursor.com)empirical2026-05-17 0044"Production-ready" agents have no canonical definition — most deployments pass 4-6 of 9 operational gatesNot yet mappedLangChain State of Agent…secondary-research2026-05-19 0043SWE-bench Verified abandoned after audit found 59% test flaws and training data contaminationNot yet mappedSWE-bench Verifiedempirical2026-05-19 0042Nine confirmed data exfiltration paths against LLM agents bypass output-layer guardrails by designNot yet mappedBack-Revealempirical2026-05-17 0041CLAUDE.md and .cursorrules are not equivalent governance files — switching AI coding tools without auditing both leaves agents with no constraint on architecture decisionsNot yet mappedGitHub public repository…empirical2026-04-01 0040Agent supply chain attacks use vectors that CVE scanners cannot detectNot yet mappedOpenClawempirical2026-05-15 0039AI code review tool detection rates vary by an order of magnitude — architecture determines the ceiling, not model qualityNot yet mappedGreptile (greptile.com)empirical2026-05-06 0038Hierarchical agent teams work at depth 2, but only if you compress context at every boundaryNot yet mappedLangGraph (langchain-ai)empirical2026-05-14 0037AutoGen's two MCP integration paths both have blocking failures, and the framework is in maintenance modeNot yet mappedmicrosoftempirical2026-04-26 0036Tool-poisoning attacks against MCP agents succeed more than one-third of the time and the stealth class is undetectable by production toolingNot yet mappedMCPTox benchmarkempirical2026-04-19 0035FastMCP's from_openapi() Path Leaked Auth Headers Twice and Corrupted GET RequestsNot yet mappedjlowinempirical2026-04-03 0034Instruction-based agent constraints are probabilistic — safety training predicts worse compliance, not betterNot yet mappedReplit (AI coding agent)empirical2026-05-15 0033Apple container's default runtime leaves network fully open — and --cap-drop ALL doesn't helpNot yet mappedappleempirical2026-05-06 0032Cline's `.clinerules` Has No Enforcement — The Model Decides What to ObeyNot yet mappedCline (cline/cline)empirical2026-05-06 0031n8n's DAG execution model breaks agent loops — tool errors crash workflows and tool results disappearNot yet mappedn8n (n8n-io/n8n)empirical2026-05-03 0030Claude Code MCP Bridge Silently Coerces All Non-String Parameters to StringsNot yet mappedAnthropicempirical2026-05-03 0029CrewAI closed tool fabrication, broken delegation, and SQL injection as not-plannedNot yet mappedCrewAI (crewAIInc/crewAI)empirical2026-05-03 0028Claude Code deny rules do not enforce across MCP tools, subagents, or Bash glob — and the gap is unfixed by designNot yet mappedanthropicsempirical2026-04-27 0027Ollama disables tool calling silently in two independent ways by defaultNot yet mappedOllama (ollama/ollama)empirical2026-04-29 0026A2A has no standard agent registry API, and AAIF hasn't closed the gap in 3 monthsNot yet mappeda2aprojectempirical2026-03-26 0025Aider bypasses git pre-commit hooks and auto-commits AI-generated code by defaultNot yet mappedAider-AIempirical2026-04-03 0024GitHub Copilot coding agent attack surface has four independent layers — three RCE chains confirmed, two vectors permanently unpatchedNot yet mappedGitHub Copilot coding…empirical2026-05-15 0023You picked vector vs graph for agent memory — the empirical answer is neither, pick compressionBuild RAGmem0aisecondary-research2026-04-27 0022MCP tool schemas written for Claude silently fail against Fireworks AI, Gemini, and OpenAINot yet mappedmodelcontextprotocolempirical2026-04-25 0021MCP write tools gated on Authorization header are Claude Code-only — other clients silently failSet up MCPClaude Codevalidated2026-04-22 0020Structured generation guarantees break silently above undocumented complexity thresholds — no provider publishes where the line isBuild RAGAnthropic Claude APIsecondary-research2026-04-21 0019playwright-mcp works in local testing and silently destroys sessions in cloud deploymentSet up MCPmicrosoftempirical2026-04-19 0018Your agent framework choice locks you into undocumented failure modes the docs won't mentionPick a frameworkCrewAIempirical2026-04-20 0017One agent security tool leaves you entirely blind to the threats only a different tier can seeNot yet mappedmcp-scan (Invariant/Snyk)medium2026-04-20 0016Error suppression patterns common in scripts will silently break your agentConfigure autonomyClaude Code (Anthropic)medium2026-03-28 0015ChromaDB will run out of RAM before you think, and v0.5.x silently orphaned your embeddingsBuild RAGChromaDB (chroma-core)medium2026-03-29 0014LangGraph checkpoints silently corrupt non-primitive types — your resume will not restore what you savedPick a frameworkLangGraph (langchain-ai)empirical2026-03-01 0013The benchmark you used to evaluate your agent was retired — 59% of its test cases were wrongEvaluate a benchmarkSWE-bench Verifiedsecondary-research2026-04-18 0012Goose ships with compounding security defaults that together equal dangerouslySkipPermissionsConfigure autonomyGoose Desktopvalidated2026-04-19 0011Enabling streaming in the OpenAI Agents SDK means your guardrails no longer block anythingPick a frameworkopenai-agents-pythonmedium2026-03-17 0010Cloning a repo is enough — Claude Code's project settings file executes before you trust itConfigure autonomyClaude Codeempirical2026-04-19 0009Claude Code hooks fire unreliably — no single hook is a security enforcement pointPick a framework · Configure autonomyClaude Codeempirical2026-04-19 0008Your LLM gateway will silently overspend your budget and bypass your guardrailsChoose a gatewayLiteLLMempirical2026-03-22 0007Three RAG pipeline failures your framework won't tell you aboutBuild RAGmicrosoftempirical2026-02-22 0006DeepEval silently exfiltrated trace data on import — and Langfuse silently drops your orchestration spansConfigure autonomyconfident-aimedium2026-04-19 0005Self-hosted graph memory will crash your async service or corrupt your files — and the docs don't mention eitherBuild RAGmem0aiempirical2026-04-20 0004Your agent CI gate is probabilistic — and your VCR recording does not cover MCP tool callsEvaluate a benchmarkconfident-aiindependently-confirmed2026-04-19 0003Two enterprise acquisitions confirmed MCP supply chain risk — but one attack class still has no defenseSet up MCPinvariantlabs-aiindependently-confirmed2026-04-19 0002MCP database servers' "read-only mode" is a string check — and it's bypassableSet up MCPexecuteautomationindependently-confirmed2026-04-19 0001Deploying your MCP server stateless will silently break sampling and elicitationSet up MCPMCP TypeScript SDKindependently-confirmed2026-04-19

showing 61 of 61filter: Allsearch: all findingssorted by newest