Pipecat’s interrupt handling drops function call results and deadlocks — two independent mechanisms

Published: 2026-05-22Last verified: 2026-05-22empirical

Published Fact-checked 2026-05-22 · 0 corrections

⚠Staleness risk: high— facts in this subject area change quickly between releases. Re-check the specific claims against your own environment before acting. (This rates the topic, not whether this page is out of date.)

Pipecat’s interrupt handling drops function call results and deadlocks — two independent mechanisms

What you expect

Pipecat’s pipeline documentation describes graceful interrupt handling: when a user speaks during TTS, the bot stops, processes the new utterance, and continues the conversation. Tool calls that were in progress recover cleanly. The Smart Turn v3 detector is the recommended turn-detection method for telephony deployments using Twilio (which outputs 8kHz audio).

What actually happens

Interrupt handling corrupts function call state — two distinct mechanisms

Mechanism A — Queue recreation (issue #4420): When a user interrupts during TTS, _handle_interruption() recreates asyncio.Queue(). Any FunctionCallResultFrame in the old queue is discarded — it is not a SystemFrame, so it is not preserved across queue recreation. The LLMAssistantContextAggregator never sees the tool result. The LLM’s next inference re-issues the same tool call, producing duplicate side effects. Fixed in PR #4435.

Mechanism B — Deadlock with pause_frame_processing=True (issue #4418): Requires three simultaneous conditions: TTS service with pause_frame_processing=True (affects Rime, ElevenLabs, Cartesia), user interruption during TTFB (200–700ms after TTS starts, before audio chunks arrive), and a FunctionCallResultFrame queued simultaneously. The process task waits on __process_event.wait() indefinitely because no BotStoppedSpeakingFrame fires (TTS was interrupted before producing audio). Subsequent LLMTextFrame accumulate but never process. Bot becomes permanently unresponsive until the call terminates.

These are independent code paths producing similar user-visible symptoms (bot hangs after interrupting during a tool call). Both were closed May 2026.

Smart Turn v3 hardcoded to 16kHz — 8kHz telephony input breaks silently

The WhisperFeatureExtractor in SmartTurnAnalyzerV3 hardcodes 16kHz with no resampling fallback (issue #3844). Measured impact at 8kHz:

30% misclassification rate (6 of 20 utterances)
Probability confidence delta up to 0.9391
Average turn duration drops 51% (2.33s → 1.14s)
Digit sequences split across turns due to missed incompleteness markers

The documentation conflict: Pipecat’s own Twilio WebSocket guide recommends audio_in_sample_rate=8000, while the SmartTurn README requires 16kHz. These two pieces of official documentation directly contradict each other, leaving telephony developers with a silent accuracy regression.

Memory leak: 3 GB/min on Linux/K8s in v0.0.85–v0.0.92

Confirmed on Ubuntu 24.04.3 in Kubernetes with the LiveKit + Deepgram + OpenAI + ElevenLabs + Krisp + Silero VAD stack (issue #3116). Regression introduced in v0.0.85; not reproducible on macOS. Root cause not definitively identified in the issue thread. PR #3499 merged. v0.0.84 is the last confirmed stable version for this stack on Linux.

System frame queue bypass is documented but not implemented

Documentation states system frames bypass the normal processing queue. Issue #4445 (closed May 2026) confirmed system frames are still enqueued. Code that relies on system frames preempting queued frames produces incorrect ordering.

v1.0.0 breaking changes — three high-impact migrations

Timeout default flip: function_call_timeout_secs changed from 10.0 to None in the v1.0.0 CHANGELOG. Existing production code with timeout-sensitive tool calls now hangs indefinitely on failure. No deprecation warning.

Import path changes: Service-specific context implementations replaced with LLMContext + LLMContextAggregatorPair. Example: from pipecat.services.openai import OpenAILLMService → from pipecat.services.openai.llm import OpenAILLMService. All existing integrations require migration.

Missing tool handler hang (issue #4300): If the LLM emits a tool call but no handler is registered, the pipeline hangs waiting for a result that never arrives. No error frame emitted. Fixed in v1.0.0; in v0.0.x this is a silent hang.

ElevenLabs word merging corrupts multilingual LLM context

_strip_leading_space cannot distinguish chunk-boundary spaces from word-separator spaces (issue #4391). In Spanish and similar languages (e.g., “que quieras” → “quequieras”), text is corrupted before being sent to the LLM context via TTSTextFrame.append_to_context=True. Subsequent conversation turns degrade because the LLM receives corrupted history.

What this means for you

If your voice agent makes tool calls, interrupt behavior is your highest-risk surface. Both mechanisms (queue recreation and deadlock) trigger in normal conversational use — users interrupt bots mid-response all the time. In production, mechanism A produces duplicate tool side effects (double bookings, double charges, duplicate API calls). Mechanism B produces permanently unresponsive bots requiring call termination.

For telephony deployments on Twilio: following Pipecat’s own Twilio guide and setting audio_in_sample_rate=8000 silently breaks Smart Turn v3 with a 51% turn duration drop and 30% misclassification rate. The failure produces no error — just degraded turn detection.

The 3 GB/min memory leak is macOS-invisible. Teams that develop on macOS will not encounter it; K8s deployments will exhaust memory within minutes per call.

What to do

Pin to v1.0.0 or later — the interrupt-handling bugs (#4420, #4418) and the missing-tool-handler hang (#4300) are all fixed in v1.0.0. Do not stay on v0.0.x for new deployments.
Fix the 8kHz telephony conflict: Set audio_in_sample_rate=16000 (not 8000) and audio_out_sample_rate=8000 for Twilio deployments. Resampling input to 16kHz for Smart Turn while keeping 8kHz output preserves telephony compatibility.
Audit for pause_frame_processing=True: If you use Rime, ElevenLabs, or Cartesia with this flag, test interruption during tool calls before shipping. The deadlock (#4418) requires all three conditions; eliminating pause_frame_processing=True eliminates the bug.
Add interrupt-aware function call tracking: Do not rely on the LLM to avoid re-issuing interrupted tool calls. Track in-flight function call IDs; deduplicate results at the tool executor layer. This protects against both mechanism A and future interrupt regressions.
Pin provider SDK versions explicitly: Pipecat’s dependency range deepgram-sdk<7,>=6.0.1 includes SDK 6.1.0, which changed socket control methods and causes silent transcript loss. Pin to the last confirmed-compatible version in your own requirements.txt.
Benchmark memory on Linux before K8s deploy: The 3 GB/min leak is not reproducible on macOS. Run at least 5 minutes of simulated calls on your target Linux distro before deploying. v0.0.84 is the last confirmed stable version for the LiveKit+Deepgram+OpenAI+ElevenLabs stack on Linux.
Treat v0.0.x → v1.0.0 as a hard migration: Import paths, context aggregator API, function call signatures, transport params, and audio serialization behavior all changed. Do not gradually migrate; plan a full rewrite of integration code.

Falsification criterion: This finding would be disproved by evidence that Pipecat v1.0.0 interrupt handling correctly preserves FunctionCallResultFrame across queue recreation (no duplicate tool calls in production interrupt tests), that pause_frame_processing=True does not deadlock when combined with an interruption during TTFB, or that Smart Turn v3 produces equivalent accuracy at 8kHz and 16kHz input.

Evidence

Tool	Version	Evidence	Result
Pipecat	issues closed May 2026; PR #4435	source-reviewed	FunctionCallResultFrame discarded on queue recreation during interrupt (#4420)
Pipecat	issues closed May 2026	source-reviewed	Three-condition deadlock: pause_frame_processing=True + interrupt during TTFB + queued FunctionCallResultFrame (#4418)
Pipecat SmartTurnAnalyzerV3	issue closed March 2026	source-reviewed	8kHz input: 51% turn duration drop, 30% misclassification; contradicts Twilio guide (#3844)
Pipecat on Ubuntu 24.04.3 / K8s	v0.0.85–v0.0.92; PR #3499 merged	source-reviewed	3 GB/min memory leak with full provider stack; not macOS-reproducible (#3116)
Pipecat	issue closed May 2026	independently-confirmed	Issues #4420 and #4418 each independently confirm interrupt handling corrupts function call state via separate mechanisms
Pipecat ElevenLabs integration	issue closed May 2026	source-reviewed	_strip_leading_space merges words across chunk boundaries in Spanish; corrupts LLM context (#4391)
Pipecat	v1.0.0 CHANGELOG	docs-reviewed	function_call_timeout_secs default changed from 10.0 to None without deprecation warning
Pipecat	issue closed May 2026	source-reviewed	System frames still enqueued despite docs claiming queue bypass (#4445)

Confidence: empirical — 8 source-reviewed evidence entries across interrupt handling, telephony, memory, and provider integrations. Independent confirmation: #4420 and #4418 each confirm function call corruption through independent code paths.

Strongest case against: All documented bugs except the memory leak have associated “closed” status in the GitHub issues, meaning v1.0.0 may have resolved most of them. A team running v1.0.0 with no legacy pause_frame_processing=True and audio_in_sample_rate=16000 may not encounter any of these failure modes in practice. The 8kHz recommendation conflict in the docs may have been corrected without a new issue being filed. Additionally, some fixes in v1.0.0 (like the audio serialization routing fix for Fish Audio, LMNT, and Rime) silently corrected behavior that appeared to work in v0.0.x — which means the severity of some bugs was masked.

Open questions: Has v1.0.0 resolved the Smart Turn 8kHz documentation conflict, or does the Twilio guide still recommend 8kHz? Is the _strip_leading_space ElevenLabs bug fixed in v1.0.0 or only partially addressed? What is the root cause of the Linux/K8s memory leak — async event loop, frame buffer lifecycle, or provider library interaction?

Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.