Words and phrases may differ in the extent to which they are susceptible to prosodic foregrounding and expressive morphology: their expressiveness. They may also differ in the degree to which they are integrated in the morphosyntactic structure of the utterance: their grammatical integration. We describe an inverse relation that holds across widely varied languages, such that more expressiveness goes together with less grammatical integration, and vice versa. We review typological evidence for this inverse relation in ten spoken languages, then quantify and explain it using Japanese corpus data. We do this by tracking ideophones – vivid sensory words also known as mimetics or expressives – across different morphosyntactic contexts and measuring their expressiveness in terms of intonation, phonation and expressive morphology. We find that as expressiveness increases, grammatical integration decreases. Using gesture as a measure independent of the speech signal, we find that the most expressive ideophones are most likely to come together with iconic gestures. We argue that the ultimate cause is the encounter of two distinct and partly incommensurable modes of representation: the gradient, iconic, depictive system represented by ideophones and iconic gestures, and the discrete, arbitrary, descriptive system represented by ordinary words. The study shows how people combine modes of representation in speech and demonstrates the value of integrating description and depiction into the scientific vision of language.