This paper considers Arbib's hypothesis that (oral) language has its roots in gesture in light of recent research on demonstratives, joint attention, and deictic pointing (Michael Arbib. 2012. How the brain got language: The Mirror System Hypothesis. Oxford: Oxford University Press). It is argued that demonstratives provide an important link between gesture, discourse, and grammar that rests on their communicative function to coordinate the interlocutors' focus of attention. Combining evidence from linguistic typology and historical linguistics with evidence from research on social cognition, the paper argues that demonstratives constitute a universal class of linguistic expressions that are commonly used in combination with a deictic pointing gesture to establish joint attention, a cognitive phenomenon that is closely related to Arbib's notion of “complex imitation”. No other class of linguistic expressions is so closely tied to the speaker's body and gesture than demonstratives. However, demonstratives are not only used to focus the language users' attention on concrete entities in the surrounding situation, they are also used to organize the information flow in discourse, which in turn underlies their frequent development into a wide range of grammatical markers, e.g. definite articles, third person pronouns, relative markers, complementizers, subordinate conjunctions, copulas, and focus markers. In this way, demonstratives provide an explicit link between gesture, imitation, and grammar that is consistent with Arbib's theory of the evolution of language.