Previous studies have shown that orthography is activated during speech processing and that it may have positive and negative effects for non-native listeners. The present study examines whether the effect of orthography on non-native word learning depends on the relationship between the grapheme–phoneme correspondences across the native and non-native orthographic systems. Specifically, congruence between grapheme–phoneme correspondences across the listeners’ languages is predicted to aid word recognition, while incongruence is predicted to hinder it. Native Spanish listeners who were Dutch learners or naïve listeners (with no exposure to Dutch) were taught Dutch pseudowords and their visual referents. They were trained with only auditory forms or with auditory and orthographic forms. During testing, non-native listeners were less accurate when the target and distractor pseudowords formed a minimal pair (differing in only one vowel) than when they formed a non-minimal pair, and performed better on perceptually easy than on perceptually difficult minimal pairs. For perceptually difficult minimal pairs, Dutch learners performed better than naïve listeners and Dutch proficiency predicted learners’ word recognition accuracy. Most importantly and as predicted, exposure to orthographic forms during training aided performance on minimal pairs with congruent orthography, while it hindered performance on minimal pairs with incongruent orthography.