Extensive evidence has demonstrated that bilinguals non-selectively activate lexicons of both languages when reading or hearing words in one language. Here, we further investigated the electrophysiological roles of cross-linguistic orthography and phonology in the processing of L2 spoken words in unbalanced Chinese (L1)–English (L2) bilinguals in a cross-modal situation. Relative to unrelated control, the recognition of auditory L2 words showed behavioral interference effects when paired with orthographic or phonological neighbors of the correct translations of L2 words. Moreover, the lexical effects were also exhibited in the electrophysiological data, as reflected by marginally less positive late positive component (500–800 ms) amplitudes in the frontal region. Importantly, the orthographic rather than phonological translation neighbor condition elicited less negative N400 (300–500 ms) amplitudes in the parietal–occipital regions, suggesting that this orthographic translation neighbor condition facilitated the co-activation of spoken L2 words. Taken together, these findings indicate that cross-linguistic orthographic and phonological activation have different temporal dynamics with both bottom-up parallel cross-linguistic activation and the top-down inhibitory control mechanism governing the two-language lexical organization in L2 spoken word recognition.