Captioned video is widely used to enhance second language (L2) learners’ exposure to oral input beyond the classroom setting, and captioning has been found to provide an instantaneous, useful visual aid for parsing and understanding L2 oral discourse. Nevertheless, a meta-analysis has shown that captioning exerts a selective effect on L2 learners with different profiles. This study investigated whether L2 learners’ modality preferences (visual vs. auditory) and working memory capacity (high vs. low) would modulate the effect of full captions on L2 listening outcome. Results from 60 participants revealed that both cognitive variables affected their L2 listening to different extents. Notably, working memory capacity modulates the impact of L2 learners’ preferred modality on their listening outcome. Modality preference did not exert any significant impact on the listening outcome of L2 learners with lower working memory capacity. For L2 learners with high working memory capacity, their modality preference played a pivotal role in modulating their listening outcome; in this case, auditory learners had the best listening performance viewing the video without captions, whereas visual learners did best when watching the captioned video. These findings speak to the need for taking individual differences into consideration when employing captioned videos.