The ability to integrate information from different sensory channels is a vital process that serves to facilitate perceptual decoding in times of unimodal ambiguity. Despite its relevance to psychosocial functioning, multimodal integration of emotional information across facial and prosodic modes has not been addressed in bipolar disorder (BD). In light of this paucity of research we investigated multimodal processing in a BD cohort using a focused attention paradigm. Fifty BD patients and 52 healthy controls completed a task assessing the cross-modal influence of emotional prosody on facial emotion recognition across congruent and incongruent facial and prosodic conditions, where attention was directed to the facial channel. There were no differences in multi-modal integration between groups at the level of accuracy, but differences were evident at the level of response time; emotional prosody biased facial recognition latencies in the control group only, where a fourfold increase in response times was evident between congruent and incongruent conditions relative to patients. The results of this study indicate that the automatic process of integrating multimodal information from facial and prosodic sensory channels is delayed in BD. Given that interpersonal communication usually occurs in real time, these results have implications for social functioning in the disorder. (JINS, 2014, 20, 1–9)