Drawing on existing research with a holistic stance toward multimodal meaning-making, this paper takes an analytic approach to integrating eye-tracking data to study the perception and use of multimodality by teachers and learners. To illustrate this approach, we analyse two webconference tutoring sessions from a telecollaborative project involving pre-service teachers and learners of Mandarin Chinese. The tutoring sessions were recorded and transcribed multimodally, and our analysis of two types of conversational side sequences shows that the integration of eye-tracking data into an ecological approach provides richer results. Specifically, our proposed approach provided a window on the participants’ cognitive management of graphic and visual affordances during interaction and uncovered episodes of joint attention.