Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Physics 2026-03-04 3 min read

Conversation Is a Joint Physical Act, Not Just an Exchange of Words

A Nature Reviews Psychology analysis argues that gestures, gaze, and millisecond timing are not accessories to speech - they are what conversation actually is.

The last time you told a story to someone and noticed their eyebrow lift slightly, you probably adjusted what you were saying - added a detail, sped up the ending, softened a claim. You almost certainly did not think of that adjustment as communication. But according to a new review published in Nature Reviews Psychology, that constant fine-tuning is not a bonus feature of conversation. It is the conversation.

Two people, one temporary cognitive system

Judith Holler and Anna K. Kuhlen of the Max Planck Institute for Psycholinguistics reviewed decades of experimental work on face-to-face interaction and arrived at a conclusion that cuts against the way psycholinguists have typically studied language: production and comprehension cannot meaningfully be separated in real conversation.

Traditional laboratory paradigms tend to study speakers and listeners independently - one person generates language, another processes it. But Holler and Kuhlen argue that in practice, both sides of a conversation are doing both things simultaneously. Speakers anticipate responses before they arrive. Listeners begin preparing replies while still processing what they're hearing. Both participants are continuously monitoring, predicting, and adjusting each other's behavior, often faster than conscious thought.

"Conversation is not a linear exchange of words," Holler writes. "It is a jointly managed activity in which meaning emerges through coordination."

What bodies are doing while mouths are talking

Face-to-face interaction relies on a stream of signals that go well beyond speech: gaze direction, eyebrow movement, posture shifts, hand gestures, brief vocalizations like "mm-hm." These signals carry information. A speaker can detect that a listener is confused, engaged, or about to interrupt - often before a single clarifying question is asked.

"Listeners are not passive recipients," Holler emphasizes. "They actively shape the speaker's unfolding message."

That feedback loop matters practically. When it is disrupted - on a video call with lag, or in audio-only communication - conversation becomes more effortful and less precise. Speakers overshoot in their explanations, ask more explicit clarifying questions, and make more errors. The review uses these friction points to demonstrate how much conversational fluency depends on the full bandwidth of embodied feedback being available in real time.

Conversation as coordinated movement

The framework Holler and Kuhlen propose reframes conversation as a form of multimodal joint action - something closer in cognitive terms to two musicians playing together than to one person transmitting a message and another receiving it. The analogy is apt: good ensemble playing requires each musician to anticipate the other, adjust in real time, and share a distributed sense of where the piece is going. The music emerges from the coordination, not from either player alone.

"Face-to-face conversation requires rapid adaptation and mutual prediction. It is a dynamic system distributed across participants and modalities," Holler notes.

Multi-party conversations add another layer of complexity. When a third person joins, each participant must track multiple others' understanding, signals, and shared knowledge simultaneously. The cognitive demands are significant, even when the conversation feels effortless.

What this means for science and technology

The review has implications beyond linguistics. If conversation is fundamentally a multimodal joint action, then any technology designed to simulate it - AI assistants, teleconferencing systems, social robots - needs to account for the full physical channel, not just the verbal one. Systems that can only process words are missing most of what makes communication work.

In clinical contexts, the framework also sharpens how researchers understand communication difficulties. Conditions that affect gaze, gesture, or the timing of responses alter the whole coordinated system, not just one participant's behavior.

The authors call for experimental paradigms that study conversation as it actually unfolds - embodied, dynamic, involving real people in real time - rather than decomposing it into isolated language tasks. That is a methodological challenge, since the more naturalistic the setting, the harder it is to control the variables. But Holler and Kuhlen argue the alternative is studying something that isn't quite conversation at all.

"Meaning does not reside in words alone," Holler concludes. "It emerges through bodies and interaction."

Source: Holler J, Kuhlen AK. "Psycholinguistic perspectives on face-to-face conversation," Nature Reviews Psychology, 2026. DOI: 10.1038/s44159-026-00538-1. Max Planck Institute for Psycholinguistics, Nijmegen.