Using the visual world paradigm with printed words, this study investigated the flexibility and representational nature of phonological prediction in real-time speech processing. Native speakers of Mandarin Chinese listened to spoken sentences containing highly predictable target words and viewed a visual array with a critical word and a distractor word on the screen. The critical word was manipulated in four ways: a highly predictable target word, a homophone competitor, a tonal competitor, or an unrelated word. Participants showed a preference for fixating on the homophone competitors before hearing the highly predictable target word. The predicted phonological information waned shortly but was re-activated later around the acoustic onset of the target word. Importantly, this homophone bias was observed only when participants were completing a ‘pronunciation judgement’ task, but not when they were completing a ‘word judgement’ task. No effect was found for the tonal competitors. The task modulation effect, combined with the temporal pattern of phonological pre-activation, indicates that phonological prediction can be flexibly generated by top-down mechanisms. The lack of tonal competitor effect suggests that phonological features such as lexical tone are not independently predicted for anticipatory speech processing.