This article is one of a series exploring the spatialization of sound sources in recorded songs and how they may be understood (see also ‘The Virtual Performance Space in Rock’, twentieth-century music 5/2). Its theoretical basis is multi-faceted, utilizing notions of ecological perception, of the sound-box, of the singer's persona, and of interpersonal distance in communication, as well as further concepts from cognitive science. It focuses particularly on image schemata and proxemics, exemplifying them across a range of genres, while also addressing them critically, for instance from a feminist perspective. Finally, it explores how this theoretical basis helps us not only to understand the contribution of spatialization to the interpretation of songs and their meanings, but also to shed light on the role of other musical domains.