MCP tool schemas written for Claude silently fail against Fireworks AI, Gemini, and OpenAI
MCP tool schemas written for Claude silently fail against Fireworks AI, Gemini, and OpenAI
What you expect
JSON Schema is a well-specified interchange format. MCP uses it as the inputSchema for tool definitions. A tool schema that validates locally and works in one client should work in all clients.
The MCP spec defines a common schema format but no conformance suite, no provider compatibility guidance, and no schema sanitization layer. The implicit promise is portability.
What actually happens
A review of 10+ public issues across major MCP server repositories. The same valid JSON Schema construct that works with Claude fails with a different provider in a distinct, hard-to-trace way. There is no cross-provider compatibility matrix. Failures surface only in production.
Bare boolean schemas crash Fireworks AI. "field": true is valid JSON Schema (meaning “accept any value”) but Fireworks AI returns HTTP 500 with no indication which tool caused it. In grafana/mcp-grafana#594, isolating the offending tool required a binary search across 25 tools. The Go interface{} type silently emits this pattern.
$ref causes LLM string serialization. When inputSchema uses $ref pointers, models treat referenced parameters as untyped and serialize objects as JSON-encoded strings, returning MCP error -32602: Invalid arguments. This was confirmed in typescript-sdk#1562, Claude Code #18260, and Kiro CLI independently — the same failure class in three separate tools. TypeScript SDK PR #1460 widened the blast radius by emitting $ref on all Zod-registered types.
$ref crashed the TypeScript SDK validator on tools/list. SDK 1.22.0 introduced AJV-based output schema caching that fails on $defs — valid JSON Schema. AWS MCP servers were forced to pin to 1.19.1 (typescript-sdk#1175).
anyOf + sibling fields blocks all Gemini requests. Gemini validates ALL tool schemas before processing any request. Gemini requires anyOf to be the sole field in a schema object — any sibling field (type, description, items) blocks every request to the provider. The official @modelcontextprotocol/server-github triggers this via the comments field in github_create_pull_request_review (opencode#14509). Workaround: disable the GitHub MCP server.
Missing additionalProperties:false breaks OpenAI strict mode. OpenAI strict function calling requires additionalProperties: false on all object schemas. The Go SDK (mcp-go) does not add it by default. github-mcp-server#376 confirms all OpenAI calls return 400 Bad Request.
Tool name collisions silently overwrite. Microsoft Research found 775 name collisions ecosystem-wide — “search” appears in 32 distinct MCP servers. The ToolRegistry uses a last-registered-wins policy with no warning.
Wide tool surfaces degrade accuracy by up to 85%. Microsoft Research measured up to 85% performance degradation with large tool spaces and up to 91% accuracy degradation with long tool responses. One MCP tool averaged 557,766 tokens per call — exceeding GPT-5’s 272K input limit. arXiv:2604.21003 found that loading a full MCP catalog upfront imposes a 10–60k input-token tax per turn even when most tools are never called.
What this means for you
A tool schema that passes local testing with Claude is not safe to deploy against Fireworks AI, Gemini, or OpenAI without a per-provider sanitization pass. The failure modes are provider-specific, silent, and surfaced only in production. Error messages do not identify the offending tool or field.
If you run a multi-provider MCP server, every LLM upgrade or provider addition is a latent schema compatibility test with no safety net. One non-conforming schema in a 40-tool server disables all Gemini access (anyOf), causes sporadic HTTP 500s from Fireworks (mcp-grafana#594), or makes every OpenAI call return 400 (github-mcp-server#376).
The name collision problem means multi-server setups are non-deterministic: which “search” runs depends on config file ordering, not your intent.
What to do
Provider-targeted schema sanitization (highest leverage). Apply a sanitization pass per-provider before forwarding schemas:
- Convert bare boolean schemas:
true→{"type": "object"},false→{"not": {}} - Inline
$refpointers: dereference all local#/$defs/Xinto inline schemas before serialization (~95 lines, no external dependencies) - For Gemini: strip sibling fields when
anyOfis present - For OpenAI strict: add
additionalProperties: falseto all object schemas - Apply nullable-aware null stripping (check
ZodNullablebefore removing null values)
The gateway layer is the correct place to apply per-provider schema sanitization — do it once, at the boundary.
Tool surface management:
- Cap tool count: Cursor’s 40-tool limit is empirically motivated, not arbitrary
- Use lazy schema loading (dynamic tool gating) — defer full schema injection until a tool is selected; arXiv:2604.21003 shows a 10–60k per-turn token reduction in large-catalog agents
- Add
compact=true/detail_levelparameters on high-output tools - Use server-enforced flow hints in descriptions to reduce wrong-path selection
Operational hygiene:
- Commit a JSON snapshot of
tools/listoutput as a contract file — fail CI on any PR that changes tool schemas without updating the contract. This catches drift before merge, including parallel-PR conflicts. - CI step: diff
tools/listfrom real server vs. mock manifest on every deploy - Inventory tool names across all configured servers before adding a new one; use server-prefixed names proactively:
{server}_{tool}
Falsification criterion: This finding would be disproved by MCP publishing a normative JSON Schema compatibility matrix confirmed to eliminate all listed failure classes, with a conformance test suite that validates schemas across all major LLM providers before server publication.
Evidence
| Tool | Version | Evidence | Result |
|---|---|---|---|
| grafana/mcp-grafana | review 2026-02 | source-reviewed | Bare boolean true in schema → Fireworks AI HTTP 500 across all tools; isolated via binary search |
| modelcontextprotocol/typescript-sdk | 1.22.0 / 1.19.1 | source-reviewed | $ref → LLM string serialization (error -32602); AJV crash on tools/list forced AWS to pin 1.19.1 |
| anomalyco/opencode | review 2026-03 | source-reviewed | anyOf + sibling fields in official GitHub MCP server blocks all Gemini requests |
| github/github-mcp-server | review 2026-03 | source-reviewed | Missing additionalProperties:false → all OpenAI calls return 400 Bad Request |
| Microsoft Research — Tool-space interference | Q1 2026 | independently-confirmed | 775 name collisions; 85% accuracy degradation with large tool surfaces; 557,766 avg tokens/tool call |
| arXiv:2604.21003 | 2026-04 | independently-confirmed | Eager schema injection imposes 10–60k token tax per turn; lazy schema loading as mitigation |
Confidence: empirical — 6 environments reviewed, multiple independent confirmations.
Strongest case against: These failure modes may reflect early-ecosystem growing pains rather than permanent architectural constraints. As MCP matures, providers may converge on a common JSON Schema subset enforced by SDK validators, eliminating the need for per-provider adaptation. The MCP spec’s open proposals (issue #1990 — conformance testsuite, SEP #1814 — caniuse-style compatibility matrix) could produce a conformance test suite that prevents new failures from entering the ecosystem. Tool surface degradation is also fixable via dynamic tool loading, which several frameworks are actively implementing.
Open questions: Does a universal “safe subset” of JSON Schema exist that all current providers accept without sanitization? Will SEP-2145 (error reporting standardization) make failures traceable before they reach production? Does the CyberArk Full-Schema Poisoning finding (attack payloads in parameter names, defaults, enums) require schema sanitization to also serve as a security layer?
Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.