Dual subtitles, combining captions (audio transcription) with subtitles translated into another language, are increasingly used in language learning. However, how they shape visual attention remains unclear. In the present experiments, we tracked the eye movements of Spanish–English bilinguals, as they viewed instructional videos with either no subtitles (Experiment 1) or dual subtitles (Experiment 2), manipulating subtitle position and audio language. Without subtitles, L1 audio focused gaze on the speaker’s eyes, while L2 audio distributed it between the eyes and mouth. With dual subtitles, gaze shifted strongly to the text, with a preference for the top line, which attracted more viewing time regardless of language. Viewers selectively attended to the line matching the audio. Comprehension improved for L2 audio with subtitles, while L1 comprehension was unaffected. Our findings demonstrate that display layout and language alignment jointly govern attentional allocation in bilingual viewing, with direct implications for L2 instructional design.