As technology increasingly integrates complex soundscapes into virtual spaces, understanding how humans perceive directional audio becomes vital. This need is bolstered by the rise of immersive media, such as augmented reality (AR) and virtual reality (VR), where users are virtually transported into other worlds. In a recent study, researchers explored how listeners identify the direction from which a speaker is facing while speaking.
The research was led by Dr. Shinya Tsuji, a postdoctoral fellow, Ms. Haruna Kashima, and Professor Takayuki Arai from the Department of Information and Communication Sciences, Sophia University, Japan. The team also included Dr. Takehiro Sugimoto, Mr. Kotaro Kinoshita, and Mr. Yasushige Nakayama from the NHK Science and Technology Research Laboratories, Japan. Their study was published in Volume 46, Issue 3 on May 1, 2025 in the journal Acoustical Science and Technology.
In the study, the researchers asked participants to identify the direction, a speaker was facing using only sound recordings, using two experiments. The first experiment involved sound recordings with variations in loudness, and the second experiment involved recordings with constant loudness. The researchers found that loudness was consistently a strong indicator in judging the speaker’s facing direction, but when loudness cues were minimized, listeners still managed to make correct judgments based on the spectral cues of the sound. These spectral cues involve the distribution and quality of sound frequencies that change subtly depending on the speaker’s orientation.
“Our study suggests that humans mainly rely on loudness to identify a speaker’s facing direction,” said Dr. Tsuji. “However, it can also be judged from some acoustic cues, such as the spectral component of the sound, not just loudness alone.”
These findings are particularly useful in virtual sound fields that allow six-degrees-of-freedom—immersive environments like those found in AR and VR applications, where users can move freely and experience audio in different spatial configurations. “In contents having virtual sound fields with six-degrees-of-freedom—like AR and VR—where listeners can freely appreciate sounds from various positions, the experience of human voices can be significantly enhanced using the findings from our research,” said Dr. Tsuji.
The research emerges at a time when immersive audio is a major design frontier for consumer tech companies. Devices such as Meta Quest 3 and Apple Vision Pro are already shifting how people interact with digital spaces. Accurate rendering of human voices in these environments can significantly elevate user experience—whether in entertainment, education, or communication.
“AR and VR have become common with advances in technology,” Dr. Tsuji added. “As more content is developed for these devices in the future, the findings of our study may contribute to such fields.”
Beyond the immediate applications, this research has broader implications in how we might build more intuitive and responsive soundscapes in the digital world. By improving realism through audio, companies can create more convincing immersive media—an important factor not only for entertainment, but also for accessibility solutions, virtual meetings, and therapeutic interventions.
By uncovering the role of both loudness and spectral cues in voice-based directionality, this study deepens our understanding of auditory perception and lays a foundation for the next generation of spatial audio systems. The findings pave the way for designing more realistic virtual interactions, particularly those involving human speech, which is probably the most familiar and meaningful sound we process every day.
Reference
Title of original paper
Perception of speech uttered as speaker faces different directions in horizontal plane: Identification of speaker’s facing directions from the listener
Journal
Acoustical Science and Technology
DOI
10.1250/ast.e24.99
Authors
Shinya Tsuji1, Haruna Kashima1, Takayuki Arai1, Takehiro Sugimoto2,Kotaro Kinoshita2, and Yasushige Nakayama2
Affiliations
1Department of Information and Communication Sciences, Sophia University, Japan, 2NHK Science and Technology Research Laboratories, Japan
About Sophia University
Established as a private Jesuit affiliated university in 1913, Sophia University is one of the most prestigious universities located in the heart of Tokyo, Japan. Imparting education through 29 departments in 9 faculties and 25 majors in 10 graduate schools, Sophia hosts more than 13,000 students from around the world.
Conceived with the spirit of “For Others, With Others,” Sophia University truly values internationality and neighborliness, and believes in education and research that go beyond national, linguistic, and academic boundaries. Sophia emphasizes on the need for multidisciplinary and fusion research to find solutions for the most pressing global issues like climate change, poverty, conflict, and violence. Over the course of the last century, Sophia has made dedicated efforts to hone future-ready graduates who can contribute their talents and learnings for the benefit of others, and pave the way for a sustainable future while “Bringing the World Together.”
Website: https://www.sophia.ac.jp/eng/
About Dr. Shinya Tsuji from Sophia University, Japan
Dr. Shinya Tsuji is a postdoctoral fellow at the Department of Information and Communication Sciences, Sophia University. His major research interests include unilateral hearing loss, and reverberation, and his expertise involves experimental psychology, human interfaces and interactions, informatics, and humanities and social sciences. He has published five articles. He is an honorable awardee of multiple recognitions, including the 2022 Student Outstanding Presentation Award from the Acoustical Society of Japan. He is also involved in social activities and contributes actively to the Information and Community Site for Unilateral Hearing Loss.
END