Music plus empathetic speech makes robots feel more human
Something shifts when a robot speaks with music playing. The machine does not change, but the person listening does. A study from The Hong Kong Polytechnic University examined this shift carefully, and its results suggest that combining music with empathetic speech in social robots produces a measurably stronger sense of human connection than speech alone.
The research project, titled A Talking Musical Robot over Multiple Interactions, was led by Prof. Johan Hoorn, Interfaculty Full Professor of Social Robotics in PolyU's School of Design and Department of Computing, in collaboration with Dr Ivy Huang at The Chinese University of Hong Kong. The study focused on Cantonese-speaking participants who interacted with an on-screen robot across three separate sessions, allowing the team to track not just whether a musical effect existed, but whether it persisted.
How music changed the perception of machines
The team's central finding was straightforward: when robots combined music with empathetic speech, participants rated the robot's empathy and human-likeness higher than when speech alone was used. The effect persisted across multiple sessions. Music, the researchers argue, acts as an emotional scaffold - it cues familiar patterns of human warmth, the kind a therapist or counselor might use before a difficult conversation.
"Our data indicate that the presence of music continued to enhance the robot's resemblance to humans in later sessions," Hoorn explained. "One interpretation is that music made the interaction feel more like a real conversation with a personality - something human counsellors might do by playing music to comfort their clients, which in turn made the robot seem more lifelike or socially present."
The effect, however, is not permanent. Across repeated exposures, participants appeared to habituate to the musical component. What once felt novel became background. This attenuation is a key limitation: any system relying on music to bolster emotional connection must adapt its choices to remain relevant to each individual user. A single playlist, played every session, loses its grip.
Designing for the long term, not just the first session
This finding has direct implications for how empathetic robots get designed. The researchers suggest effective systems will need to vary musical elements, adjust dialogue content progressively, and respond to user feedback over time. In other words, the machine must learn what works for each person and keep updating.
The study treats music not as decoration but as a functional communication channel alongside speech, and the results support giving it formal weight in robot design. For elder care settings, where loneliness can be severe and persistent, this distinction matters considerably.
"Our research points to the significance of multimodal communication encompassing music, speech and more through empathetic robots," Hoorn said. "It holds considerable promise for application in real-world settings, particularly in the fields of mental health support and elderly care."
Honest limits of the current evidence
Several constraints deserve acknowledgment. The participant pool was restricted to Cantonese speakers, which means the findings cannot be automatically extended to other linguistic or cultural groups - music perception and emotional response vary across cultures. The interactions were with an on-screen robot rather than a physical one, and physical presence may alter emotional dynamics in ways this study cannot measure. The three-session design also means long-term effects over weeks or months remain unknown.
Crucially, the study does not demonstrate that music-enhanced robots improve mental health outcomes. It shows that perceived empathy increases - a precursor to wellbeing effects, but not itself a clinical result.
What comes next
Hoorn is already leading a follow-on project with substantially larger scope. "Social Robots with Embedded Large Language Models Releasing Stress among the Hong Kong Population" received funding of over HK$40 million from the Research Grants Council Theme-based Research Scheme. That project extends this work by integrating large language models directly into the robot's conversational architecture.
He also holds a concurrent appointment as Associate Director of the PolyU Research Institute for Quantum Technology, where he is developing quantum-inspired models of human affect. These models represent emotional states as probabilistic superpositions - a technical approach designed to capture the genuine uncertainty of emotional experience rather than forcing it into discrete categories. The ambition is robots that stay responsive across the full arc of a person's emotional life, not just in initial encounters.
The study was published in ACM Transactions on Human-Robot Interaction.