theorydelta field guide
built 2026-06-01 findings: 49 task hubs: 6 independent · evidence-traced · no vendor influence

MCP tool schemas written for Claude silently fail against Fireworks AI, Gemini, and OpenAI

Published: 2026-04-27 Last verified: 2026-04-25 empirical
Staleness risk: high — facts in this subject area change quickly between releases. Re-check the specific claims against your own environment before acting. (This rates the topic, not whether this page is out of date.)

MCP tool schemas written for Claude silently fail against Fireworks AI, Gemini, and OpenAI

What you expect

JSON Schema is a well-specified interchange format. MCP uses it as the inputSchema for tool definitions. A tool schema that validates locally and works in one client should work in all clients.

The MCP spec defines a common schema format but no conformance suite, no provider compatibility guidance, and no schema sanitization layer. The implicit promise is portability.

What actually happens

A review of 10+ public issues across major MCP server repositories. The same valid JSON Schema construct that works with Claude fails with a different provider in a distinct, hard-to-trace way. There is no cross-provider compatibility matrix. Failures surface only in production.

Bare boolean schemas crash Fireworks AI. "field": true is valid JSON Schema (meaning “accept any value”) but Fireworks AI returns HTTP 500 with no indication which tool caused it. In grafana/mcp-grafana#594, isolating the offending tool required a binary search across 25 tools. The Go interface{} type silently emits this pattern.

$ref causes LLM string serialization. When inputSchema uses $ref pointers, models treat referenced parameters as untyped and serialize objects as JSON-encoded strings, returning MCP error -32602: Invalid arguments. This was confirmed in typescript-sdk#1562, Claude Code #18260, and Kiro CLI independently — the same failure class in three separate tools. TypeScript SDK PR #1460 widened the blast radius by emitting $ref on all Zod-registered types.

$ref crashed the TypeScript SDK validator on tools/list. SDK 1.22.0 introduced AJV-based output schema caching that fails on $defs — valid JSON Schema. AWS MCP servers were forced to pin to 1.19.1 (typescript-sdk#1175).

anyOf + sibling fields blocks all Gemini requests. Gemini validates ALL tool schemas before processing any request. Gemini requires anyOf to be the sole field in a schema object — any sibling field (type, description, items) blocks every request to the provider. The official @modelcontextprotocol/server-github triggers this via the comments field in github_create_pull_request_review (opencode#14509). Workaround: disable the GitHub MCP server.

Missing additionalProperties:false breaks OpenAI strict mode. OpenAI strict function calling requires additionalProperties: false on all object schemas. The Go SDK (mcp-go) does not add it by default. github-mcp-server#376 confirms all OpenAI calls return 400 Bad Request.

Tool name collisions silently overwrite. Microsoft Research found 775 name collisions ecosystem-wide — “search” appears in 32 distinct MCP servers. The ToolRegistry uses a last-registered-wins policy with no warning.

Wide tool surfaces degrade accuracy by up to 85%. Microsoft Research measured up to 85% performance degradation with large tool spaces and up to 91% accuracy degradation with long tool responses. One MCP tool averaged 557,766 tokens per call — exceeding GPT-5’s 272K input limit. arXiv:2604.21003 found that loading a full MCP catalog upfront imposes a 10–60k input-token tax per turn even when most tools are never called.

What this means for you

A tool schema that passes local testing with Claude is not safe to deploy against Fireworks AI, Gemini, or OpenAI without a per-provider sanitization pass. The failure modes are provider-specific, silent, and surfaced only in production. Error messages do not identify the offending tool or field.

If you run a multi-provider MCP server, every LLM upgrade or provider addition is a latent schema compatibility test with no safety net. One non-conforming schema in a 40-tool server disables all Gemini access (anyOf), causes sporadic HTTP 500s from Fireworks (mcp-grafana#594), or makes every OpenAI call return 400 (github-mcp-server#376).

The name collision problem means multi-server setups are non-deterministic: which “search” runs depends on config file ordering, not your intent.

What to do

Provider-targeted schema sanitization (highest leverage). Apply a sanitization pass per-provider before forwarding schemas:

  1. Convert bare boolean schemas: true{"type": "object"}, false{"not": {}}
  2. Inline $ref pointers: dereference all local #/$defs/X into inline schemas before serialization (~95 lines, no external dependencies)
  3. For Gemini: strip sibling fields when anyOf is present
  4. For OpenAI strict: add additionalProperties: false to all object schemas
  5. Apply nullable-aware null stripping (check ZodNullable before removing null values)

The gateway layer is the correct place to apply per-provider schema sanitization — do it once, at the boundary.

Tool surface management:

  1. Cap tool count: Cursor’s 40-tool limit is empirically motivated, not arbitrary
  2. Use lazy schema loading (dynamic tool gating) — defer full schema injection until a tool is selected; arXiv:2604.21003 shows a 10–60k per-turn token reduction in large-catalog agents
  3. Add compact=true / detail_level parameters on high-output tools
  4. Use server-enforced flow hints in descriptions to reduce wrong-path selection

Operational hygiene:

  1. Commit a JSON snapshot of tools/list output as a contract file — fail CI on any PR that changes tool schemas without updating the contract. This catches drift before merge, including parallel-PR conflicts.
  2. CI step: diff tools/list from real server vs. mock manifest on every deploy
  3. Inventory tool names across all configured servers before adding a new one; use server-prefixed names proactively: {server}_{tool}

Falsification criterion: This finding would be disproved by MCP publishing a normative JSON Schema compatibility matrix confirmed to eliminate all listed failure classes, with a conformance test suite that validates schemas across all major LLM providers before server publication.

Evidence

ToolVersionEvidenceResult
grafana/mcp-grafanareview 2026-02source-reviewedBare boolean true in schema → Fireworks AI HTTP 500 across all tools; isolated via binary search
modelcontextprotocol/typescript-sdk1.22.0 / 1.19.1source-reviewed$ref → LLM string serialization (error -32602); AJV crash on tools/list forced AWS to pin 1.19.1
anomalyco/opencodereview 2026-03source-reviewedanyOf + sibling fields in official GitHub MCP server blocks all Gemini requests
github/github-mcp-serverreview 2026-03source-reviewedMissing additionalProperties:false → all OpenAI calls return 400 Bad Request
Microsoft Research — Tool-space interferenceQ1 2026independently-confirmed775 name collisions; 85% accuracy degradation with large tool surfaces; 557,766 avg tokens/tool call
arXiv:2604.210032026-04independently-confirmedEager schema injection imposes 10–60k token tax per turn; lazy schema loading as mitigation

Confidence: empirical — 6 environments reviewed, multiple independent confirmations.

Strongest case against: These failure modes may reflect early-ecosystem growing pains rather than permanent architectural constraints. As MCP matures, providers may converge on a common JSON Schema subset enforced by SDK validators, eliminating the need for per-provider adaptation. The MCP spec’s open proposals (issue #1990 — conformance testsuite, SEP #1814 — caniuse-style compatibility matrix) could produce a conformance test suite that prevents new failures from entering the ecosystem. Tool surface degradation is also fixable via dynamic tool loading, which several frameworks are actively implementing.

Open questions: Does a universal “safe subset” of JSON Schema exist that all current providers accept without sanitization? Will SEP-2145 (error reporting standardization) make failures traceable before they reach production? Does the CyberArk Full-Schema Poisoning finding (attack payloads in parameter names, defaults, enums) require schema sanitization to also serve as a security layer?

Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.

theorydelta.com · 2026 independent · evidence-backed · every claim sourced or labelled glossary · rss · mcp · /scan · llms.txt